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Worldwide weapons 


Progress towards a United Nations arms-trade treaty is encouraging, but it won’t keep weapons 


out of the hands of human-rights abusers. 


safer place. Albert Einstein was committed to the international 

peace effort, and Soviet nuclear physicist Andrei Sakharov and 
US chemist Linus Pauling are among the researchers who have been 
awarded the Nobel Peace Prize. The Polish—British nuclear physicist 
Joseph Rotblat received the peace prize in conjunction with the Pugwash 
Conferences on Science and World Affairs, the nuclear-disarmament 
organization that he helped to found (see Nature 481, 438-439; 2012). 
The attitudes of these researchers chime with the internationalism of the 
scientific endeavour and the humanitarian goals that often motivate it. 

At the same time, science and technology are integral to military 
development, and defence funding supports a great deal of research, 
much of it excellent. There need be no contradiction here: nations have 
aright to self-defence, and armed forces are often deployed for peace- 
keeping as well as for aggression. But what constitutes responsible 
use of military might is controversial, and peace-keeping is generally 
necessary only because aggressors have been supplied with military 
hardware in the first place. 

Allofthis makes arms control a thorny subject for scientists. When, 
at a session on human rights at a physics conference several years ago, 
Nature asked whether the link between the arms trade and human- 
rights abuses might raise ethical concerns about research on offensive 
weaponry, the panellists shuffled in their seats and became tongue-tied. 

There is no easy way to demarcate the ethical boundaries of defence 
research. But scientists should welcome progress towards a binding 
United Nations Arms Trade Treaty (ATT), for which a preparatory 
meeting in New York this week presages final negotiations in July. 

The treaty aims to align the conditions and standards for arms 
exports from all signatory countries. The UK government's Foreign 
and Commonwealth Office (FCO), which supports the ATT, says that 
inconsistencies and loopholes in current regional and national control 
systems for the arms trade hinder sustainable development, under- 
mine stability and democracy, and impede progress towards the UN’s 
Millennium Development Goals. 

Some nations will attempt to have the treaty watered down. That the 
sole vote against the principle at the UN General Assembly in October 
2009 was from Zimbabwe speaks volumes about probable reasons for 
opposition. But let us not overlook the fact that in a vote a year earlier, 
Zimbabwe was joined by one other dissenter: the United States, then 
led by President George W. Bush. Would any of the current leading US 
Republican candidates for president be better disposed towards an ATT? 

Even if it does go ahead, a treaty will not necessarily change the arms 
trade much. Most of the military technology used for human-rights 
abuses in recent decades has been obtained 
legally. Sales from Britain, for example, helped 
Libya’s former leaders to suppress ‘rebels’ in 2011 
and enabled Zimbabwe to launch assaults in the 
Democratic Republic of Congo in the 1990s. 


So are prominent among those trying to make the world a 
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The UK government admits that the ATT will probably not reduce 
arms sales. It says that the criteria for exports “would be based on 
existing obligations and commitments to prevent human rights abuse” 
— which have not been notably effective. According to the FCO, the 
treaty aims “to prevent weapons reaching terrorists, insurgents and 

human rights abusers”. But as demonstrated 


“The Arms in Libya, one person's insurgents are another’s 
Trade Treaty democratizers, and today’s legitimate rulers can 
could simply be tomorrow’s human-rights abusers. 

legitimize The FCO says that the treaty “will be good 
business as for business, both manufacturing and export 
usual.” sales”. Indeed, arms manufacturers support it as 


a way of levelling the market playing field. The 
ATT could simply legitimize business as usual by clearly demarcat- 
ing it from the black market, and it will not cover peripheral military 
hardware such as surveillance and information technology. Some have 
argued that the treaty will be a distraction from the problem of keeping 
arms from human-rights violators (D. B. Kopel et al. Penn State Law 
Rev. 114, 101-163; 2010). 

So although there are good reasons to call for a strong ATT, it is 
no panacea. The real need is to establish what a responsible arms 
trade would look like, if this ist an oxymoron. Some hard research is 
required on how existing, ‘above-board’ arms sales have affected gov- 
ernance, political stability and socio-economic conditions worldwide. 
This is challenging and contentious, but several starts have been made, 
both in the UN and nationally (see, for example, www.unidir.org and 
www.prio.no/nisat). We need more. m 


Tough choices 


Scientists must find ways to make more efficient 
use of funds — or politicians may doit for them. 


page through President Barack Obama's 2013 budget proposal. 
Despite substantial cuts elsewhere — and fierce pressure from 
Republicans to cut more — Obama called for healthy overall increases 
in both fundamental research and science education (see page 283). 
But the good news, of course, is tempered by reality. Obama’s budget 
document is one long struggle to balance two contradictory goals: to 
stimulate the lagging US economy and to curb the annual budget deficit, 
which is more than US$1 trillion. Science and science education are 
widely viewed as helping with the first, and will doubtless continue to 
be seen as such no matter who wins November's presidential election. 


S cientists in the United States can find plenty of good news as they 
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The idea that science is a driver of prosperity is one of the few things on 
which the United States’ bitterly divided political parties still agree. But 
the science funding agencies themselves are by no means immune to the 
second goal. The harder the cuts bite, the more those agencies will have 
to streamline their operations and merge or terminate programmes. 
This week’s budget proposal, which contains many references to 
“tough choices’, shows that this process is already well under way. The 
Department of Energy (DOE), for example, wants to discontinue fund- 
ing of several dozen projects that have not met their research milestones, 
or that seem otherwise unpromising. The National Science Foundation 
(NSF) is likewise cutting back on some $66 million in lower-priority 
education, outreach and research programmes. The National Institutes 
of Health (NIH) has been ordered to pursue “new grant management 
policies” to increase the number of new grants by 7%. And NASA is 
being obliged to make drastic cuts to its Mars exploration programme 
so as to finish building its flagship James Webb Space Telescope. 
Conceivably, this process could get even more drastic. Last month, 
Obama asked Congress to give him the authority to consolidate and 
streamline agencies on his own initiative — and suggested that one early 
application would be to transfer the National Oceanic and Atmospheric 
Administration from the Department of Commerce to the Department 
of the Interior. If Congress were to give Obama that power, it is possible 
to imagine him — or some future Republican president — sending all of 
the NSF’s science-education programmes to the Department of Educa- 
tion, or merging the DOE's particle and nuclear physics research into the 


NSE, under the guise of making management of science more efficient. 
White House officials insist that no one in the administration is even 
contemplating such a wholesale restructuring. But the arithmetic of the 
deficit is unavoidable. Individual researchers, scientific societies and 
science funding agencies can no longer afford 


“Researchers , to be purely reactive, responding to each cut 
societies and : as it comes along. They need to be part of the 
funding agencies debate, thinking systematically about how 
can no longer programmes and even whole agencies could 
afford to be be restructured to make them more efficient 


purely reactive.” at using the scarce funds available, and more 
effective at promoting the best science. 

To do that, and to address the increasing demands from politicians 
and voters for evidence that fundamental research is useful, scientists 
must also find better ways to measure the effectiveness of the nation’s 
investments in science. The usual technique is to insist that principal 
investigators produce more and more reports, which tends to be a waste 
of everyone's time. A consortium of six universities called Star Metrics, 
launched in 2010 and headquartered at the NIH, has shown that it is 
possible to do better by using natural language processing and other 
tools to mine the data and reports that the agencies already collect. 
But even that is just a beginning. Researchers and research institutions 
need to help to devise still better measures — because if they don't do it 
themselves, politicians and others who know much less about science 
may very well do it for them. And who knows where that would end. m 


On the up 


The soaring incidence of diabetes is driving the 
United Arab Emirates’ science ambitions. 


the prevalence of diabetes was a modest 6%. The effects of oil-field 

wealth — both good and bad — had yet to kick in. Now the skyline 
soars with elegant skyscrapers, among them the world’s tallest, the 
830-metre-high Burj Khalifa. Literacy rates have risen with them to 
more than 90%, thanks to the enlightened policies of the ruling fami- 
lies in the United Arab Emirates (UAE). Unfortunately, as the residents 
of the federated country settled into sedentary, well-fed lifestyles, the 
prevalence of diabetes also soared — to well over 20%. 

This is one reason why UAE science minister Sheikh Nahayan 
Mabarak Al Nahayan, who spoke to Nature earlier this month, sees 
merit in his country joining Europe's biobanking network, the Biobank- 
ing and Biomolecular Resources Research Infrastructure (BBMRI). 

The BBMRI collects and shares standardized genetic and medical 
information on national populations. It is a long-term mega-project 
with a focus on complex diseases — including diabetes — that are 
caused by multiple genetic and environmental factors, and can be 
understood only by studying large numbers of people. Membership 
would support the UAE’s focus on its principal medical problem, while 
building science capacity to international standards, one of its govern- 
ment’s stated goals. For its part, the BBMRI is keen to add the emirates 
and its scientifically valuable population to the network. 

Securing funding for the initiative won't be easy, however. The UAE 
comprises seven emirates, each ruled by its own royal family. Abu 
Dhabiis the largest by area — and, as home to most of the country’s oil 
fields, the richest — whereas Dubai is the largest in terms of popula- 
tion. Between them, the royal families of these two emirates share the 
bulk of power in the federal government, which has a limited budget, 
mostly contributed by Abu Dhabi. 

This complex political constellation has hindered the UAE’s ambi- 
tion to become a scientific force. For example, in 2008 the federal 


I n 1990, Dubai’s desert skyline was flat. Literacy rates were low and 
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government created the National Research Foundation (NRF) to sup- 
port and encourage competitive research. Its annual budget was set at 
100 million UAE dirham (US$27 million) and it opened calls for centres 
of excellence and for individual investigators. Winners, identified by 
international peer review, were congratulated with great fanfare, but 
most of the promised funds for the NRF never arrived. The culture of 
science is new and foreign to UAE rulers, who are used to buying in 
whatever they need. 

The process did at least provide the NRF with an effective audit of 
the small UAE research base. This has helped it to lobby successfully for 
money from industry, foundations and states on a case-by-case basis 
for some of the winners whose projects most visibly align with national 
interests, such as research on water resources. This is not how an inde- 
pendent research agency should operate, but positive feedback from the 
funded projects may eventually encourage proper funding of the NRE 

The NRF exercise also identified genetics and disease as a key 
research strength in the emirates, and similar lobbying is likely to see 
a centre of excellence established in this field. This should give hope 
to UAE scientists interested in participating in the BBMRI. 

Those scientists met their European colleagues for first discussions 
earlier this month in Dubai, where they will also hold a scientific sym- 
posium in October. A formal proposal for funding will emerge from 
the symposium, and the UAE government would be wise to support 
it. UAE researchers say that the state has three natural resources: oil 
and gas, dates and camel milk. In decades to come, the oil and gas will 
run out. Membership of the BBMRI could provide a short cut in the 
long process of building a knowledge-based economy. 

The other Gulf states share the UAE’s predisposition to diabetes, as 
well as its dependence on oil and gas reserves. They could also benefit 
from joining the project, and extending the Middle Eastern population 
base would make the research more powerful. 

Will that happen? The Gulf states tend to be very competitive. 
Unnerved by the Burj Khalifa, Saudi Arabia planned to build its King- 
dom Tower to be 1,000 metres. And Kuwait's Mubarak al-Kabir Tower 
is proposed to be 1,001 metres. The BBMRI 
offers an alternative model, an umbrella under 
which participants cooperate and share. This is 
a different mind set, but one that is necessary 
to resolve the region's shared health problems. m 
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its new government transferred national science policy to the 

Ministry of Economy and Competitiveness, a duty for which 
this ministry seems most unsuited. Science was an unwelcome addi- 
tion that absorbed more than half of the €1,083-million (US$1,438- 
million) budget cut imposed on the ministry. This sends an alarming 
signal of the sacrifices that science may face when the government 
releases its budget for 2012 next month. 

This is the first time that neither ‘science’ nor ‘researcl’ have featured 
in the name of any top Spanish government department. It is not just a 
symbolic shift: it continues our country’s trend of deliberately under- 
mining and playing down the importance of science. 

The official line is clear: science is nota priority in Spain. Of course, 
we are immersed in an economic crisis and 
austerity measures are needed. However, the 
government's irrational and draconian actions 
will cause long-term damage to the scientific 
infrastructure and send contradictory messages 
to other countries and investors. Although its 
rhetoric promises a shift to a knowledge-based 
economy, every step it takes is in the opposite 
direction. The results will be a borrowed-know- 
ledge economy with little domestic know-how. 

The problems did not start with the new 
government: the previous administration 
attempted to pass a Kafkaesque by-law for pub- 
lic universities that would have created a merit- 
evaluation system that diminished the weight 
assigned to research and technology transfer. The 
by-law stated that trade unions would negotiate 
the criteria for faculty promotion, making aca- 
demic careers “more predictable and more egalitariam” It would have 
been the death of meritocracy. The same by-law would also have bal- 
looned bureaucracy to such a level that it would have threatened to 
swamp any university administration. 

The previous government also opposed attempts to create a genuine 
tenure-track system for researchers in universities and national labo- 
ratories, on the grounds that tenure track is unconstitutional because 
access to civil service should be “egalitarian” so tenured jobs should not 
be targeted to tenure-track researchers. This is a consequence of the 
narrow-minded thought that all researchers in the public sector should 
be civil servants, but civil service is unsuited to research activities. 

Spain likes to boast that it has an equivalent to tenure track: the 
Ramon y Cajal programme. Launched in 2001, this is the only 
nationwide programme that has managed to 


S pain no longer has a ministry of science. In the last days of 2011, 


attract and retain highly qualified researchers NATURE.COM 
from Spain and abroad. However, drastic cuts _ Discuss this article 
in hiring over the past three years andahiring _ onlineat: 

freeze announced this year will kill this first — go.nature.com/egjua3 


THE GOVERNMENT’S 


IRRATIONAL 
AND DRACONIAN 
ACTIONS WILL CAUSE 
LONG-TERM 


DAMAGE 
TO THE SCIENTIFIC 
INFRASTRUCTURE. 


Spanish changes are 
scientific suicide 


Ifresearch continues to be sidelined, Spain will be left with little domestic 
expertise, warns Amaya Moro- Martin. 


attempt at a tenure-track programme. The prospects are so grim that 
despite being eager to return to Spain, some of my Spanish colleagues 
in the United States are rejecting Ramon y Cajal positions. 

The hiring freeze is suicidal. Researchers who retire will no longer 
be replaced. Unlike many of its neighbours, Spain has a very limited 
science and technology industry in which to absorb highly qualified 
workers, so scientists aged 20- 40 years will have no choice but to leave 
if they want to further their career. The country will therefore face a 
multigenerational brain drain, with corresponding losses in innova- 
tion, inspiration and credibility. The damage from this decision will 
take decades to reverse. 

The new government is now effectively trampling on the best hope 
that Spanish researchers had for the future. Legislation in the pipeline 
could have improved the situation, but the gov- 
ernment has, abruptly and without explanation, 
closed the two political science commissions 
— one in the Senate and one in the Congress — 
that would have been responsible for steering 
through this legislation. 

The legislation includes moves to allow univer- 
sities and research centres to be funded privately, 
to develop a new science and technology strategy 
and to create a proper national research agency 
with a multi-year budget. We urgently need such 
a system in Spain, where severe and unpredict- 
able fluctuations in year-to-year funding make 
medium- to long-term planning impossible. 
The strategy is crucial if Spain is to coordinate its 
increasingly anarchic 18 sets of science policies 
— laid out simultaneously by the 17 regional gov- 
ernments and the central government — and to 
introduce a smarter, top-down, approach to tackling national problems. 

Spain must bring its science and technology investment (currently 
1.39% of gross domestic product) in line with European standards 
(2%) and closer to the 3% goal set by the European Council Lisbon 
Strategy for 2010. It also needs a science council, similar to the Ger- 
man Wissenschaftsrat, constituted mainly of scientists who have been 
elected by the scientific community to take the lead in delivering the 
national science and technology strategy. 

Spain's situation is summed up by a poster for a recent Hollywood 
blockbuster: “No plan. No backup. No choice. Mission: Impossible. 
Ghost Protocol.” Spanish science cannot afford ghost protocols. With- 
out the proposed strategy there is no plan, and without a well-funded 
and non-political national research funding agency, there is no backup. 
The results leave research in Spain with a mission impossible. m 


Amaya Moro-Martin is a Ramon y Cajal Fellow at the Spanish 
National Research Council in Madrid. 
e-mail: amaya@cab.inta-csic.es 
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Selections from the 
scientific literature 


Arainy signal 
from noise 


A temperature increase 

of at least 1.4°C is needed 
before changes in regional 
precipitation can be 
distinguished from regular 
variability and attributed to 
global warming. 

A team led by Irina 
Mahlstein of the National 
Oceanic and Atmospheric 
Administration in Boulder, 
Colorado, used a suite of 
general circulation models to 
analyse regional precipitation 
trends from 1900 to 2099. 
The analysis focused on wet 
seasons, for which models 
performed most accurately 
against historical trends. 

By the end of this century, 
the study suggests, increases 
in wet-season precipitation 
will be apparent in many 
areas. However, the authors 
note that changes in 
extreme weather or annual 
precipitation might be 
detectable much earlier. 
Geophys. Res. Lett. http://dx.doi. 
org/10.1029/2011GL050738 
(2012) 


Nanoscale shells 
trap light 


Sheets of silicon nanoshells 
created by a team in California 
could lead to ultra-thin solar 
panels that are cheaper and 
easier to mass-produce than 
those currently available. 
Conventional solar 
panels absorb light using 
relatively thick layers of 
nanocrystalline silicon that 


MICROBIOLOGY 


Seal corpses shelter Antarctic microbes 


Mummified seals scattered across the deserts of 
Antarctica’s McMurdo Dry Valleys reveal that 
microbial communities in the region respond 
rapidly to environmental change. 

The seal carcasses are naturally mummified 
by the extremely dry, cold conditions of one 
of the world’s least hospitable climates. Craig 
Cary at the University of Waikato in Hamilton, 
New Zealand, and his colleagues found 
that undisturbed carcasses boost humidity, 
stabilize temperature and alter the microbial 


can be time-consuming to 
manufacture. Yi Cui and 
his colleagues at Stanford 
University manufactured 
spherical, hollow silicon shells 
using standard chemical 
techniques and deposited them 
ona sheet (pictured). Light 
captured by the material was 
reflected many times inside the 
shells, increasing the amount 
of energy the sheet absorbed. 
The team found that a 
50-nanometre-thick layer 
of shells was as efficient as a 
1-micrometre-thick sheet of 
conventional silicon. 
Nature Commun. 3, 664 (2012) 
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communities in the soils beneath them. 
The researchers assayed how quickly those 
communities changed by transplanting a 


250-year-old seal carcass to a pristine location. 


centuries. 


Sex is spread 
across the genes 


Sex-specific behaviours in 
activities such as mating and 
parenting are controlled ina 
modular way by distinct sets 
of genes. 

Nirao Shah at the University 
of California, San Francisco, 
and his colleagues screened 
the brains of male and female 
mice for differences in gene 
expression. They identified 
16 genes differentially 
expressed in the hypothalamus 


Two summers later, the microbial composition 
of the soil beneath the transplanted seal 
resembled that of the seal’s original location. 
This challenges the hypothesis that the region's 
soil ecology changes only over the course of 


Nature Commun. 3,660 (2012) 


and amygdala, brain regions 
implicated in the control 

of sex-related behaviours. 
Sex hormones, which drive 
behavioural differences 
between the sexes, exert 
their effects by regulating the 
expression patterns of these 
genes. 

Mice deficient in one of the 
genes demonstrated subtle 
differences in particular 
sex-specific behaviours, 
such as female acceptance 
of — or male interest in — 
penetration, without affecting 
other sex-typical behaviours. 
Cell 148, 596-607 (2012) 


WILEY 


Fish figures hint 
at past extinctions 


Contrary to the popular 
saying, there are not plenty 
of fish in the sea. But why? 
Perhaps because a huge 
number of species became 
extinct in ancient times, 
say Greta Carrete Vega and 
John Wiens at Stony Brook 
University in New York. 

Marine environments cover 
about 70% of Earth’s surface 
but contain only 15-25% ofall 
estimated species. To find out 
why, Vega and Wiens studied 
actinopterygian (ray-finned) 
fish — which encompass 
96% of Earth's fish species 
— in marine and freshwater 
environments. 

They found that both 
environments were similarly 
rich in actinopterygian 
species, even through the 
marine environment is 
much larger and has greater 
primary productivity. They 
also discovered that all extant 
marine actinopterygians 
descend from a freshwater 
ancestor, suggesting that 
ancient extinctions have 
robbed the seas of their species. 
Proc. R. Soc. B http://dx.doi. 
org/10.1098/rspb.2012.0075 
(2012) 


More super-hot 
summers ahead 


Summer temperatures once 
considered exceptionally 
high have, in recent decades, 
become more frequent 
across the United 
States as a result 
of anthropogenic 
climate change. 
Philip Duffy, 
currently at 
the Lawrence 
Livermore 
National Laboratory 
in Livermore, California, 
and Claudia Tebaldi at 
the National Center for 
Atmospheric Research in 
Boulder, Colorado, compared 
summer temperature 
extremes from 1950 to 1999 


with simulations derived from 
16 global climate models. 
Model projections suggest 
that US summer temperatures 
will continue to rise as 

the century progresses. 

Even in regions that have 
warmed relatively little so 

far, the chances of extreme 
temperatures — seen only 
once in 20 years in the second 
half of the past century — will 
be at least 70% in any given 
year by 2064. 

Clim. Change http://dx.doi. 
org/10.1007/s10584-012- 
0396-6 (2012) 


Six-faced 
particles 


Janus particles, named after 
the two-faced Roman god, 
are solid particles of two 
halves, each with different 
physical properties and with 
applications that include drug 
delivery. But Shoji Takeuchi 
and his colleagues at the 
University of Tokyo have 
gone beyond Janus'’s two faces 
and made gel spheres with 

up to six distinct sections. 
The particles are about 

100 micrometres in diameter 
and, when each section is 
permeated with different 
fluorescent nanobeads, look 
like beach balls. 

To make them, the 
researchers injected dyed 
sodium alginate solutions 
down a multi-barrelled 
capillary tube, which they 
then centrifuged. This forced 
the liquid streams out to form 
droplets composed of multiple 
liquids. The droplets 
fell into a waiting 
bath of calcium 
chloride, turning 

them to gel before 

the component 

solutions could 
mix. 

The researchers 
also used their method 
to produce Janus particles 

holding magnetic particles 
and living cells in their 
separate halves (pictured). 
Adv. Mater. http://dx.doi. 
org/10.1002/adma.201102560 
(2012) 
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COMMUNITY 


CHOICE 


T-cell retreat in chronic hepatitis C 


After acute infection with hepatitis C, some 
people recover whereas others develop 
chronic disease. Contrary to expectation, 
the latter group launches the same initial 
immune response against the virus as the former. 

Recovery from hepatitis C infection was thought to be 
heralded by a broad response by CD4" T cells, which had 
not previously been detected in most patients with chronic 
disease. But Georg Lauer of Massachusetts General Hospital 
in Boston and his colleagues detected these cells in the blood 
samples of 31 patients with acute infection, including 13 who 
later advanced to chronic disease. The researchers cultured 
the immune cells and, with the aid of sensitive fluorescent 
labelling, showed that the cells later disappeared from samples 
taken from the chronic disease group. 

Early antiviral therapy can prevent the loss of CD4* 
T cells, suggesting a possible means by which to prevent the 
development of chronic infection. 
J. Exp. Med. 209, 61-75 (2012) 
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Stripes from 
shifting cells 


the charge at the cells’ surface. 
In 60% of such encounters, 
the melanophores moved 
away from the xanthophores. 
However, in mutant fish 

that lack regular stripes 
(bottom), the two cell types 
always stayed in contact with 


Repulsion between pigment 
cells helps to explain how 
adult zebrafish develop the 


stripes for which they are each other. 
named. The authors say that 
Shigeru Kondo and his although repulsion alone is not 


colleagues at Osaka University 
in Japan looked at cultured 
black melanophore and yellow 
xanthophore pigment cells 
from the animals (pictured 
top). They found that when 
the black pigment cells 

came into contact with the 
yellow ones, their membrane 
potential changed, shifting 
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sufficient to explain pattern 
formation, repulsion could 
be involved in the stripes’ 
development. 

Science 335, 677 (2012) 
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SEVEN DAYS nsensns 


US budget hopes 


US President Barack Obama 
proposed slight increases for 
most major science agencies 
in his 2013 budget request, 
released on 13 February. The 
country’s largest research 
agency, the National 
Institutes of Health, saw its 
budget held level. NASA, 
meanwhile, looks set to lose 
out, with cuts of 3.2% to its 
science budget and 21% to 
planetary science — leading 
the agency's administrator 
Charles Bolden to cancel plans 
for joint Mars missions with 
the European Space Agency. 
However, the forthcoming 
presidential election and 
existing agreements to trim 
government spending mean 
that next year’s science budget 
remains very uncertain. See 
pages 283-285 for more. 


Elsevier boycott 
The number of researchers 
who have signed a public 
pledge not to support 
academic-publishing giant 
Elsevier, headquartered in 
Amsterdam, passed 5,000 last 
week. Mathematician Timothy 
Gowers at the University of 
Cambridge, UK, began the 
boycott with a 21 January 
blog opposing the company’s 
practices, which he says hinder 
the dissemination of research. 
By 14 February, 5,847 had 
signed an online petition. A 
similar campaign in 2000-01, 
which attracted 30,000 
signatories, was connected 

to the founding of the Public 
Library of Science publishing 
venture headquartered in 

San Francisco, California. 

See go.nature.com/uzkmay 
for more. 


Nuclear clean-up 
The Japanese government has 
threatened to withhold about 
¥1 trillion (US$12.8 billion) 
in rescue funds for the private 


Putin's subglacial sample 


Russia’s Arctic and Antarctic Research Institute in 

St Petersburg confirmed on 8 February that scientists have 
managed to drill 3,769 metres through Antarctica’s ice sheet 
to reach the subglacial Lake Vostok. The breakthrough 

was made on 5 February, the institute said. By 10 February, 
Russia's prime minister, Vladimir Putin, had acclaimed the 
discovery, and was presented on national television with a 
sample of yellowing water (pictured) — although the water 
was probably from melted ice at the bottom of the borehole, 
not from the lake itself. The Russian drilling team has now left 
the borehole until next summer (in December), when they 
will return to do further analysis. See page 287 for more. 


company that runs the 
stricken Fukushima Daiichi 
nuclear power plant, unless 
it gets more say in the firm's 
operations. On 13 February, 
Japan’s energy and trade 
minister, Yukio Edano, did 
approve ¥690 billion for 

the Tokyo Electric Power 
Company (TEPCO), which 
is currently struggling to pay 
compensation costs and clean 
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up the plant after it was hit by 
atsunami last year. But, Edano 
said, the larger separate bailout 
would depend on TEPCO 
ceding partial control to the 
government. 


US nuclear approval 
The United States has given 
the green light to its first 

new nuclear reactors since 
1978. On 9 February, the 


US Nuclear Regulatory 
Commission (NRC) approved 
an application by utility 

giant Southern Company, 
based in Atlanta, Georgia, to 
build two pressurized-water 
nuclear reactors at its Vogtle 
station near Waynesboro. The 
company said the reactors 
could be operating by 2017. 
However, a US nuclear 
renaissance seems unlikely, 
because few other reactor 
proposals are in the pipeline. 
See go.nature.com/tws1loz 

for more. 


| BUSINESS 
Biosimilars rules 


Drug-makers keen to sell 
generic forms of branded 
biological drugs — such as 
enzymes and antibodies — 
were excited to finally see 
draft guidance on the matter 
emerge from the US Food and 
Drug Administration (FDA) 
on 9 February. Proteins are 
large and complex, so it is 
much harder to copy drugs 
based on them than small- 
molecule drugs (see Nature 
449, 274-276; 2007). The 
FDA wants firms to prove 
their molecules’ similarity 
to branded biologics before 
the generics can be approved 
— but the agency provided 
few concrete details, instead 
saying that it would judge 
ona case-by-case basis. See 
go.nature.com/nhbvik 

for more. 


Illumina takeover 
Illumina, the dominant 
developer of DNA- 
sequencing technology, 

has, as expected, rejected 

a US$5.7-billion takeover 
bid by drug giant Roche, 
based in Basel, Switzerland. 
On 7 February, the board of 
directors of Illumina, which is 
headquartered in San Diego, 
California, said that the 

25 January offer was “grossly 
inadequate’, undervaluing its 
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company’s prospects. Roche 
replied that its bid was “full 
and fair”. The firm is now 
expected to start wooing 
Illumina’s major investors 

to accept a takeover — even 
though Illumina’s share price 
is currently well above 
Roche’s bid. 


Synbio troubles 

US synthetic-biology firm 
Amyris — which engineers 
microbes to process plant 
sugars into useful chemicals 

— sawits share price plunge 
by 28% on 10 February, after 

it admitted that it would not 
meet its production forecasts 
for a key chemical, farnesene. 
The firm, based in Emeryville, 
California, also said that it 
would not use its own microbe 
vats to produce synthetic fuels, 
leaving high-volume efforts 

to oil company Total, based in 
Paris, and biofuel firm Cosan in 
Sao Paulo, Brazil, with whom it 
has signed joint ventures. 


Pe RESEARCH 
Vega launches 


Europe's Vega rocket, a low- 
cost launcher intended to 
get small scientific satellites 
into low-Earth orbit, hada 
successful maiden flight on 
13 February. The inaugural 
launch, from the European 
Space Agency’s spaceport 
in Kourou, French Guiana, 
carried nine satellites; its 
main research payload was 


TREND WATCH 


Brazil has continued its rapid 
rise in planting of commercial 


genetically modified (GM) crops. 


The country, which is the world’s 
second-largest adopter of such 
crops, grew 30.3 million hectares 
of GM soya, maize (corn) and 
cotton last year, a 19% increase 
on 2010. Argentina — which 
plants similar crops and is the 
third-largest adopter — crept up 
3% to 23.7 million hectares. Both 


stay well behind the United States, 
which planted 69 million hectares 
in 2011. In total, 29 countries now 
plant GM crops. 


the Italian Space Agency’s 
Laser Relativity Satellite 
(LARES, pictured: sphere on 
top of the rocket’s payload) 
which will study the Lense- 
Thirring effect, a distortion of 
space-time caused by Earth's 
gravity. The Vega rocket has 
cost more than €700 million 
(US$924 million) to develop; 
five further flights are planned 
before 2016. See go.nature. 
com/srl2fb for more. 


LHC schedule 


On 13 February, operators 
of the world’s most powerful 
particle accelerator 
announced their plan for 

its 2012 run, which starts in 
March. The new schedule 
calls for the Large Hadron 
Collider (LHC), located 
near Geneva, Switzerland, to 
smash protons together at an 
energy of 8 teraelectronvolts 
(TeV), an increase of 1 TeV 


over the previous year, 

but still well short of the 
14-TeV collision energy that 
the reactor was originally 
designed to reach. The team 
expects the LHC to produce 
around 1,600 trillion proton- 
proton collisions this year, a 
threefold increase over 2011. 
See go.nature.com/xivplh 
for more. 


Denisovan genome 


The complete genetic 
sequence of an extinct relative 
of humans — the Denisovan 
— was posted online (see 
go.nature.com/wvtcfi) on 

6 February, allowing others to 
download the data while the 
work awaits formal journal 
publication. Researchers from 
the Max Planck Institute for 
Evolutionary Anthropology 
in Leipzig, Germany, 

mapped every position in 

the genome an average of 


BRAZIL DRIVES GM CROP GROWTH 


Planting of genetically modified (GM) crops grew by 
8% in 2011 to 160 million hectares worldwide. 
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SEVEN DAYS | THIS WEEK 


16-17 FEBRUARY 

In Geneva, Switzerland, 
the World Health 
Organization will 
gather experts to discuss 
‘urgent questions’ about 
research censorship and 
public safety, relating to 
unpublished work on 
mutant, transmissible 
strains of the HSN1 
influenza A virus. See 
page 289 for more on 
the flu-virus debate. 
go.nature.com/pf7bwv 


20-24 FEBRUARY 
Marine scientists’ 
responses to the Gulf 
of Mexico oil spill in 
2010 are among topics 
discussed at the Ocean 
Sciences Meeting in Salt 
Lake City, Utah. 
www.sgmeet.com/osm2012 


30 times, improving on the 
1.9-fold coverage in their 
2010 draft genome (D. Reich 
et al. Nature 468, 1053-1060; 
2010). A 30,000-50,000-year- 
old finger bone found in the 
Denisova Cave, southern 
Siberia, in 2008 yielded 

the genetic material. See 
go.nature.com/w3evow 

for more. 


PEOPLE 


China science prize 
Chinese physicist Xie Jialin, 
who pioneered the building 
of China’s first high-energy 
linear particle accelerator in 
1964, and helped to develop 
its first free-electron laser in 
1993, has won his nation’s top 
science and technology award, 
worth 5 million renminbi 
(US$794,000). The prize — 
which has been awarded by the 
country’s president annually 
since 2000 — was presented 
on 14 February. Architect and 
town planner Wu Liangyong 
also won. 


> NATURE.COM 
For daily news updates see: 
WwW.nature.com/news 
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Obama shoots for 
science increase 


US president wants to make room for research to grow in 
2013 — but faces an uphill battle. 


BY IVAN SEMENIUK, MEREDITH WADMAN, 
SUSAN YOUNG, ERIC HAND, EUGENIE 
SAMUEL REICH & RICHARD MONASTERSKY 


running through your house,” Barack 

Obama quipped last week at the White 
House science fair, a showcase for student 
exhibitors that also gave the US president a 
chance to reiterate a favourite theme. Science 
and technology, he said, “is what’s going to 
make a difference in this country, over the 
long haul”. 

Obama would clearly like to see many 
more robots, as well as researchers and engi- 
neers, running around in the future, a wish 
reflected in his budget request for fiscal year 
2013, released on 13 February. The docu- 
ment’s message is one of big ambitions with 
fewer resources. 

A year ago, Obama proposed bold increases 
for science agencies, but a Congress intent on 
curbing government spending refused to back 
many of them. This time, the White House 
has scaled back in several areas but boosted 
overall funding for non-defence research 
and development by 5%, pushing it up to 
US$64.9 billion. 

“Overall, the budget sustains an upward 
trend,” says John Holdren, director of the 
White House Office of Science and Technol- 
ogy Policy in Washington DC. “Because of fis- 
cal restraints, it’s not at the rate we preferred.” 

With an election coming this November, 
House Republicans are unlikely to be gener- 
ous with the president’s request. As in previous 
years, Congress could delay action on the 
budget, especially if it decides to wait for vot- 
ers to weigh in on Obama’ presidency before 
making its decision. And the spectre of a severe 
across-the-board cut dangles over the govern- 
ment because of an act introduced last year 
that aims to chop $1.2 trillion from spending, 
starting in January 2013. 

Here is an overview of what the president's 
request would mean in key science domains 
(see “Tough decisions’). 


cc IE not every day you have robots 


BIOMEDICINE AND PUBLIC HEALTH 

The National Institutes of Health (NIH) in 
Bethesda, Maryland, by far the largest US 
research agency, sees its budget held level at 
$30.7 billion — a far cry from the $1-billion 
increase Obama proposed a year ago. Despite 
the ceiling, Lawrence Tabak, the NIH’s 


16 FEBRUARY 2012 | VOL 482 | NATURE | 283 


© 2012 Macmillan Publishers Limited. All rights reserved 


| NEWS | IN FOCUS 


> principal deputy director, sees the budget 
as “continuing our priorities in basic science’, 
and it allows the agency to boost the number 
of new and competing grants it funds by 8%. 

The newly launched National Center for 
Advancing Translational Sciences (NCATS) 
in Bethesda will grow by 11%, to $639 million. 
Much of the rise goes to the Cures Accelera- 
tion Network, an effort to spur development of 
badly needed medicines through bold, multi- 
million-dollar grants. The programme's alloca- 
tion grows fivefold next year, to $50 million. 

Accomplishing all this within a flat budget 
requires cuts. Losers include the National Chil- 
dren’s Study, a long-term study of early influ- 
ences on the health of more than a hundred 
thousand children, which received $194 mil- 
lion in 2012, but has been cut by $29 million; 
and the Institutional Development Award 
programme, aimed at developing research 
infrastructure in rural and underserved states, 
which loses nearly $48 million. 

To pinch the pennies that will make new 
grants possible, the NIH plans to eliminate 
inflationary increases for some existing grants, 
cut others by 1% and keep grants seeking 
renewal at current levels. The agency predicts 
that these measures would boost the success 
rate for grant applications, currently at a his- 
toric low of 18%, but only to 19%. 

The flat-lined budget has drawn bleak 
appraisals from NIH advocates. “We are 
talking about a budget that is probably close 
to 20% smaller than it was a decade ago, 
adjusted for inflation,” says David Moore, 
senior director for government relations at 
the Association of American Medical Col- 
leges in Washington DC. 

Jennifer Zeitzer, director of legislative 
relations at the Federation of American Socie- 
ties for Experimental Biology in Bethesda, says 
that her organization will work with research 
champions to persuade Congress to boost the 
allocation for the NIH. The president's request, 
she says, “is not what we need to take advantage 
of the scientific opportunities that are before us” 

The outlook is even less favourable for the 
Centers for Disease Control and Prevention 
(CDC) in Atlanta, Georgia, which has had its 
budget cut by 12%, to make a total of 22% in 
cuts since 2010. Those cuts are, in part, coun- 
terbalanced by bringing in funds from a long- 
standing health-services evaluation fund and 
from the Prevention and Public Health Fund, 
which is part of the health reform law that 
Obama introduced in 2010. 

The dependence on the Prevention and 
Public Health Fund worries public-health 
advocates. Using the fund to patch holes in 
the CDC’s budget is “troubling”, says Emily 
Holubowich, executive director of the Coali- 
tion for Health Funding, based in Washington 
DC. “The future of the fund is tenuous at best.” 

Obama has also kept the budget mostly flat 
for the Food and Drug Administration (FDA) 
in Silver Spring, Maryland. However, the 


The winners and losers in President Barack Obama’s budget request for 2013 (US$ millions). 


TOUGH DECISIONS 

Agency 2011 2012 
actual | estimated 

Biomedical research and public health 

National Institutes of Health | 30,470 | 30,702 

Centers for Disease Control | 5,726 B82 

and Prevention 

Food and Drug 2,403 2,506 

Administration 

Physical sciences 

National Science 6,806 7,032 

Foundation 

NASA (science) 4919 | 5,074 

Department of Energy 4,897 4,874 

Office of Science 

National Institute of 754 761 

Standards and Technology 

Earth and environment 

Environmental Protection 8,681 | 8,450 

Agency 

National Oceanic and 4,727 5,014 

Atmospheric Administration 

US Geological Survey 1,084 | 1,068 


2013 Details 

requested 

30,702 Flat funding overall but 11% boost for 
translational science centre 

5,068 Efforts in public health and disease 
prevention bear the brunt of a deep cut 

2 oy Flat government funding, but surge in 
industry user fees lifts overall budget by 
17% to $4,486 million 

T3872 Big gains for interdisciplinary ideas — 
as well as marketable ones 

4,911 Flagship telescope still on track, but 
future of Mars exploration less certain 

4,992 Overall increase masks cuts to some 
basic-research programmes 

860 Substantial increase, with around half 
to advanced manufacturing research 

8,344 Despite ongoing declines, core science 
and regulatory programmes preserved 

5,179 Modest rise, with satellite programme 
getting much-needed boost 

1,102 Increased money for disaster response 
and research on hydraulic fracturing 


Source: White House Office of Management and Budget 


agency will receive a $583-million bolus from 
new industries, mainly from food-registration 
and inspection fees and from makers of generic 
drugs and biosimilars. 

The FDA has already been criticized for 
becoming too reliant on industry funding, but 
Margaret Hamburg, the FDAs commissioner, 
says that the fees are needed to ensure effective 
and timely drug and device review. “There is a 
common good here,’ she says. 


PHYSICAL SCIENCES 
The White House continues to support a 
long-term doubling of budgets for physical- 
science agencies, including the National Sci- 
ence Foundation (NSF) in Arlington, Virginia; 
the Department of Energy’s Office of Science 
in Washington DC; and the National Institute 
of Standards and Technology (NIST) in Gaith- 
ersburg, Maryland. The doubling, relative to 
2006, is a goal of the America COMPETES Acct, 
introduced under former President George 
W. Bush that year, and signed into law in 2007. 
Congressional appropriators have, however, 
slowed the pace of these agencies’ growth con- 
siderably since then (see ‘A long way to go). 
The budget also shifts funding towards the 
applied end of the research spectrum, where 
advances should translate into economic gains 
more quickly. It continues to fund I-Corps, a 
programme launched last year that partners 
entrepreneurs with scientists seeking to test the 
marketability of their research. And advanced 
manufacturing, which supports industry by 


developing measurement capabilities and 
standards to guide new product development, 
gets $149 million — money that NSF director 
Subra Suresh says will help to stem a decline in 
US manufacturing. “In times of constrained 
budgets, we need to be crystal clear about why 
NSF matters,” Suresh says. 

The NSF emerges as a clear winner in 
Obama’ request, with a 5% boost to its bottom 
line. And one thing is very clear at the agency: 
researchers pursuing interdisciplinary research 
will be rewarded, with $63 million allocated to 
a programme that supports such work. 

The NIST also gets a large increase, much 
of which is aimed at advanced manufacturing, 
including both a robotics programme and a 
‘materials genome initiative that aims to speed 
up the development of new materials. 

The Department of Energy’s Office of 
Science receives a more modest rise, much 
of which goes to its national laboratories 
and Energy Innovation Hubs. Several basic- 
research programmes are trimmed, including 
nuclear physics and high-energy physics, a 
shift that is consistent with the administra- 
tion’s emphasis on applied research that is most 
relevant to energy technology. 

“Basic research is systematically down,” 
says Milind Diwan, a physicist at Brookhaven 
National Laboratory in Upton, New York, and 
co-spokesman for a planned particle physics 
experiment that received a drop in funding. 
“Those of us in fundamental-research have to 
live within those priorities” 
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At NASA, the talk is of “tough but sustainable 
choices” for an agency that would receive 
$17.7 billion in 2013, $59 million less than in 
2012. Its science budget drops by 3.2%, but 
planetary science bears the brunt of that, with 
acut of 21%. For years, NASA has been pursu- 
ing plans with the European Space Agency for 
joint missions to Mars in 2016 and 2018. But on 
Monday, NASA administrator Charles Bolden 
pulled the plug. “We just cannot do another 
flagship right now,’ he said. Officials fear that 
the costs for these missions would spiral out of 
control, as they have for the $8.8-billion James 
Webb Space Telescope, a follow-on to the Hub- 
ble telescope that is slated for a 2018 launch. 

The pinch will perhaps be felt most keenly 
at the Jet Propulsion Laboratory (JPL) in 
Pasadena, California, the traditional home of 
the Mars Exploration Program. Last year, the 
laboratory had to lay off the equivalent of 246 
full-time employees, reducing its staff to 5,047. 
When the $2.5-billion Mars Science Laboratory 
lands in August, the JPL will have to quickly 
find new work for a few hundred employees so 
the latest Mars cancellations make more lay- 
offs likely. “Our expectation was that wed have 
another mission to move these people on to,” 
says Richard O’ Toole, the JPL’s manager of leg- 
islative affairs. “We definitely feel the pressure.” 


ENERGY, EARTH AND ENVIRONMENT 

In Obama’ plan, spending on energy efficiency 
and renewable energy rises by $457 million, to 
$2.3 billion, with the largest increases target- 
ing advanced manufacturing, and vehicle and 
building technologies. These programmes, 
run by the Department of Energy, are aimed 
at bolstering the competitiveness of industry. 
“Our motto is Invented in America, made in 
America, sold worldwide,” says energy secre- 
tary Steven Chu. 

Included in the package are increases for 
research in solar energy, bioenergy and fossil 
fuels, including $155 million for carbon cap- 
ture and storage systems. But there are reduc- 
tions and shifts as well: the budget for wind 
power remains unchanged but is allocated 
mainly to offshore technologies. Spending on 
nuclear energy continues an ongoing move 
toward small, modular reactors. 

The National Oceanic and Atmospheric 
Administration (NOAA) in Washington DC 
receives a boost of 3%. That isn’t enough to 
offset both inflation and rising salaries, but 


> 


MORE 
ONLINE 


nonetheless protects a core agency priority: 
a programme of polar-orbiting weather and 
environment satellites that has been troubled 
by delays and cost overruns. Last year, NOAA 
requested a hefty increase of $688 million to 
get the programme back on track, but received 
just under two-thirds of that. This year, the sat- 
ellite programme is boosted by $169 million. 
NOAA watchers looking for signs of the 


ALONG WAY TO GO 


The push to double funding for three key 
US science agencies to boost innovation 
is proceeding more slowly than planned. 


m Economic stimulus funding 

@ National Institute of Standards and Technology 
™ Department of Energy Office of Science 

m@ National Science Foundation 


20. 


—_ be 
jo} a 


Oo 


Billions of current US$ 


2007 


2009 2011 2013 


president’s proposed reorganization of the 
Department of Commerce, which would move 
NOAA from there to the Department of the 
Interior, found no trace of the plan in the 2013 
budget. The budget is also silent on another 
big initiative, the creation of a climate service 
within NOAA. 

In what could be a third straight year of 
declining budgets for the US Environmental 
Protection Agency (EPA) in Washington DC, 
the agency’s funding has been slashed by 1%, 
to $8.3 billion, almost $2 billion less than in 
2010. Nonetheless, funding for initiatives that 
target climate change and the environment 
rises slightly, to $807 million, protecting core 
science and regulatory efforts. To compensate, 
the White House has cut $359 million from a 
pair of clean-water grant programmes. These 
programmes are popular in Congress, and law- 
makers have reversed similar cuts in the past. 

“They did a pretty good job in making sure 
we are not hurting our environment and con- 
servation programmes,’ says Scott Slesinger, 
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legislative director at the Natural Resources 
Defense Council in Washington DC. But Sles- 
inger expects Congress to inflict further cuts. 

With a 3% rise for its overall budget, the 
US Geological Survey (USGS) in Reston, 
Virginia, fares better than most mission-ori- 
ented science agencies. The agency's research 
and development portfolio expands from 
$675.5 million to $726.5 million. Part of the 
increase includes an extra $13 million for 
research on the effects of hydraulic fracturing, 
the process used by the oil and gas industry 
to squeeze hydrocarbons out of non-porous 
rock. The president has also pumped an extra 
$10.3 million into natural-hazards work, 
including $2.4 million for research on quick 
responses to earthquakes, volcanic eruptions 
and landslides, and $1.6 million to study the 
risk of earthquakes in the eastern United 
States, which was shaken by a magnitude-5.8 
tremor last August. 

Daniel Sarewitz, a geoscientist and co-direc- 
tor of the Consortium for Science, Policy and 
Outcomes at Arizona State University in Tempe, 
supports the increase for the USGS. “The survey 
doesn’t get a lot of attention, but it does things 
that are important for the nation and it’s struc- 
tured in ways that make its science very useful.” 


EDUCATION 

The administration has taken pains to advertise 
a $3-billion effort to increase and strengthen 
the future US science and technology work- 
force. For example, a combined expenditure 
of $135 million by the Department of Educa- 
tion and the NSF aims to boost the number of 
science and mathematics teachers by 100,000 
over the coming decade. An even more ambi- 
tious effort allocates an additional $81 million 
to increasing the number of science graduates 
by one million — roughly 30% more than there 
are today — over the same period. According 
to Carl Wieman, associate director for science 
at the Office of Science and Technology Policy, 
simply reducing the attrition of science majors, 
which currently runs as high as 60%, could 
drive much of that increase. 

Obama made a point of previewing both 
initiatives during the White House science 
fair, telling students there, “You give me con- 
fidence that America’s best days are still to 
come.’ Now, as the budget goes to Congress, 
the battle to support lofty goals with real 
dollars begins a new round. = 
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THE BROWNING OF THE PLANET 


A model developed at the Potsdam Institute for Climate Impact Research in Germany combined 
temperature and precipitation projections from 19 general circulation models (GCMs) to predict the 
most likely regions of vegetation loss. Results are shown for two different warming scenarios. 
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Models hone picture 
of climate impacts 


International programme will improve predictions. 


BY QUIRIN SCHIERMEIER 


ill the warming planet be able to 
sustain coming generations? Few 
questions about the future matter 
more. But although modellers can forecast 
temperature changes and even precipitation, 
they struggle to say how climate change will 
affect the factors that make the planet habit- 
able, such as food and water availability. Last 
week at the Potsdam Institute for Climate 
Impact Research (PIK) in Germany, research- 
ers launched a fast-track programme to make 
their narratives of possible futures more coher- 
ent and useful to decision-makers. 
Climate-impact models combine pro- 
jections of change in physical climate with 
data on population, economic growth and 


other socio-economic variables. For various 
emissions scenarios, they forecast climate- 
driven changes in crop yields, vegetation 
zones, hydrology and human health (see “The 
browning of the planet’). But they often leave 
out important elements: for example, models 
of health impact often neglect the role of social 
factors in spreading disease; and models of 
water run-off may not account for changes in 
water loss from plants. Researchers have built 
dozens of models, but have never systemati- 
cally compared their performance. Asa result, 
say critics, the literature on climate impacts is 
as inconclusive as it is encyclopaedic. 

“Impact research is lagging behind physical 
climate sciences,’ says Pavel Kabat, director of 
the International Institute for Applied Systems 
Analysis in Laxenburg, Austria, which is to 
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coordinate the fast-track programme jointly < 
with the PIK. “Impact models have never been 
global, and their output is often sketchy. It is a 
matter of responsibility to society that we do 
better” 

The programme, dubbed the Inter-Sectoral 
Impact Model Intercomparison Project (ISI- 
MIP), involves more than two dozen model- 
ling groups from eight countries, and they 
have set themselves a tight deadline. At the 
kick-off meeting at the PIK, the researchers 
agreed to complete a comprehensive set of 
model experiments within six months. All the 
simulations will cover the globe at the same 
resolution, and will be based on the same set 
of climate data from state-of-the-art climate 
models, driven by the latest greenhouse-gas 
emission scenarios (R. H. Moss Nature 463, 
747-756; 2010). 

The comparison should reveal systematic 
biases that lead models to give widely differ- 
ing results. The ‘refined’ models will still give 
a range of answers — but the modellers hope 
that the diversity will be informative rather 
than frustrating. “We'll never be able to tell 
exactly what the future will look like,” says 
Ottmar Edenhofer, chief economist at the 
PIK. “But we can illuminate plausible paths, 
and multi-model comparisons help light up 
the black boxes.” For example, he says, the 
severity with which crop failures affect poor 
societies depends on factors such as global 
trade flows and local institutions and infra- 
structure — which some models can handle 
better than others. 

By January 2013, the project hopes to 
produce papers detailing the impact of 
climate change on global agriculture and water 
supplies, vegetation and health. The results 
could find their way into the next report of 
the Intergovernmental Panel on Climate 
Change (IPCC), which is set to be published in 
2013-14. “It will make a real difference for the 
assessment process,” says Chris Field, an ecolo- 
gist at the Carnegie Institution for Science in 
Stanford, California, and co-chairman of the 
IPCC’s working group on impacts, adaptation 
and vulnerability. 

The ISI-MIP will continue into 2013 and 
may be expanded to cover impacts on trans- 
port and energy infrastructures, both of 
which are vulnerable to the effects of rising 
temperatures and changing weather. It will 
also feed into other efforts, such as a five- 
year, 30-million-renminbi (US$4.8-million) 
Chinese programme on climate-related risks, 
such as floods and droughts. “This exercise 
will greatly inform our own studies,” says 
Qiuhong Tang, a hydrologist at the Chinese 
Academy of Science’s Institute of Geographic 
Sciences and Natural Resources Research in 
Beijing, and co-leader of the Chinese project. 
Impact modelling will be a major part of it, 
he says, because “climate change may greatly 
affect water resources and food security in the 
world’s most populous country”. = 
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POLAR RESEARCH 


Russians celebrate 


LONG WAY DOWN 


About two decades after drilling at Lake Vostok 
began, a Russian team has finally hit water, 
having coped with major technical problems 
and contamination concerns. 


Vostok victory 


Team finally drills into biggest Antarctic subglacial lake. 


BY NICOLA JONES 


fter two decades of chilly drilling and 
A= debate, a Russian team has finally 

broken into Lake Vostok. The largest of 
the lakes hidden under Antarctica’s ice, and the 
most deeply buried, Vostok has been isolated 
for millions of years and may contain specially 
adapted microorganisms. “I’m sure they’re 
drinking vodka this week,” says John Priscu, 
an Antarctic researcher at Montana State Uni- 
versity in Bozeman, who has been in contact 
with the Russian team. 

According to Valery Lukin, director of the 
Russian Antarctic programme, the drill hit lake 
water 3,769.3 metres down at 10:25 p.m. on 
5 February local time (see ‘Long way down). 
Temperatures were plummeting as the Antarc- 
tic summer ended, and scientists left the next 
day before it became too cold for planes to fly 
safely. “Talk about suspense. It has been a nail- 
biter for the past couple of weeks,” says Priscu. 

Although the Russian scientists have taken 
samples, which are most likely to be from 
a pocket of water just above the lake (one 
container was presented to Russian Prime 
Minister Vladimir Putin with great fan- 
fare), they will have to wait until December 
to extract any frozen lake samples, and until 
2013-14 to retrieve unfrozen lake water. “This 
is a technological achievement. The scientific 
pay-off is still many years away,” says Mahlon 
Kennicutt, president of the international 
Scientific Committee on Antarctic Research. 

The Vostok drilling project began as an 
ice-coring effort to examine ancient climatic 
conditions. By the mid-1990s, scientists had 
confirmed that a giant lake lurked beneath 
the borehole and speculated that sampling its 
water might yield signs of ancient life’. By the 
end of the 1990s, the research community had 
agreed that the Vostok drilling should stop 
until researchers could be sure that the lake 
would be protected from contamination by 
the unsterile kerosene and Freon being used 
as drilling fluids. Drilling started up again in 
2005 with a new plan: when the drill neared 
the lake, it would be replaced with a thermal 
probe to melt through the ice, and a plug of 
clean silicone fluid that would help to protect 
the lake water from the dirty kerosene above’. 

Although it is unclear whether the Russian 
team used the thermal probe and silicone, it 


probably avoided contaminating the lake. 
When the drill broke through to the lake, water 
surged roughly 30-40 metres up the borehole, 
forcing 1.5 cubic metres of drilling fluid out of 
the top of the hole. “If everything went as they 
said, the only flow would be out of the lake, not 
into the lake,” says Kennicutt. 

The lake water at the bottom of the hole will 
freeze, and the researchers plan to drill it out 
next season. Previous studies found cells in 
samples of accretion ice* — the bottom couple 
of hundred metres of the glacier made from 
frozen lake water — but contamination has not 
been ruled out. The fresh ice plug is unlikely to 
clear up that controversy, because the samples 
must be brought to the surface through drilling 
kerosene, says Kennicutt. The freezing process 
may also exclude or kill microbes, he adds. 

The Russian team plans to explore the lake 
in 2013-14 using a variety of probes, cameras 
and water samplers carried down the bore- 
hole in a hermetically sealed container. One 
probe will measure physical conditions such as 
temperature and acidity, while another will 
carry a spectrometer to study any organic 
compounds in the water. 

Meanwhile, the United Kingdom and the 
United States aim to sample water and sedi- 
ments from different Antarctic subglacial lakes 
a year earlier, in 2012-13. Both projects will 
use heated glacier meltwater to bore holes that 
should stay open for 24 hours, a cleaner and 
quicker process that should allow the UK team 
to get through 3.1 kilometres of ice into Lake 
Ellsworth in just 3 days. Vostok’s thicker glacier 
and lower temperatures would have made the 
process too energy-intensive to be practical 
there, however. 

Kennicutt hopes that the Vostok, Ellsworth 
and US Lake Whillans projects will form the 
first three nodes of a network that will better 
sample the hundreds of subglacial Antarctic 
lakes. “They’re not actually at the extremes 
of pressure and temperature, but they are 
limited in nutrients and energy,’ says Kenni- 
cutt. If life is eventually confirmed to reside in 
these inhospitable places, “the question is how 
microbes make a living down there”. m 
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1990 — Drilling 
begins at Vostok 
Station 


1992 — Drilling 
of borehole 
5G-1 begins 


Borehole 5G-1 


Lake Vostok 


Borehole 
5G-2 


Accreted ice 


| 1998 — Stopped for 
more than 5 years 


2007/08 — 
Drill stuck 


2011 — Ice-water 
boundary reached 
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Sequencing set to alter 
clinical landscape 


Access to whole genomes shifts potential for diagnosis, but 
poses challenges for doctors and regulators. 


BY ERIKA CHECK HAYDEN 


can costas little as tests that target specific 

genes, and in a handful of cases it has led 
to a life-changing diagnosis. No wonder the 
technology is moving fast from bench to bed- 
side. But as the trend accelerates, researchers 
are left grappling with complex questions about 
the medical value of patient genomes, and how 
sequencing in the clinic should be regulated. 

Two clinical sequencing programmes that 
launched in January, one at Baylor College of 
Medicine in Houston, Texas, and the other 
at the University of California, Los Angeles 
(UCLA), suggest that genomics is about to 
enter the clinic on a large scale. Already, a 
growing list of companies and hospitals offer 
sequencing of whole genomes or the exome 
(the gene-coding part of the genome). Sev- 
eral other efforts are set to begin this year, 
and many are in the spotlight this week at the 
Advances in Genome Biology and Technology 
meeting in Marco Island, Florida. 

“We're all amazed by the speed at which this 
is happening,” says Michael Watson, executive 
director of the American College of Medical 
Genetics (ACMG) in Bethesda, Maryland. 
“Tve never seen any genetic technology move 
at this pace” 

Sequencing sets of genes is already being 
used to guide cancer treatment, and many can- 
cer centres expect to move to whole-genome 
or exome sequencing at some point. Clinical 
sequencing programmes aimed at identifying 
the causes of rare genetic diseases, mostly in 
children, are also being set up. But how often 
access to a patient's full genome actually leads 
to a useful diagnosis is an open question, 
because most published reports have focused 
on one-off success stories (see ‘Put to the test’). 

Results from early clinical sequencing 
efforts are beginning to quantify these suc- 
cesses. Tina Hambuch, scientific liaison at 
Illumina, based in San Diego, California, 
which began offering clinical sequencing 

in 2009, says that a 


equencing a patient’s complete genome 


> NATURE.COM whole genome yields a 
Read more about diagnosis in about 40% 
whole genomesin of cases. And Wayne 
the clinic at: Grody, a medical geneti- 
go.nature.com/yv7rls cist and medical director 


of UCLAs clinical sequencing programme, 
says that, so far, about half of the 10 patients 
who have come through his programme 
have received a diagnosis. Geneticists cau- 
tion, however, that these samples are small 
and highly selective, and that the true rate of 


PUT T0 THE TEST 


A growing number of successes have 
spurred the use of whole-genome 
sequencing as a diagnostic tool. 


UA AU Human genome 


draft completed by competing teams. 


Va eA First sequence of an 


individual human, James Watson. 


VE PAUTURS First sequenced 


family uncovers causative gene for 
Miller syndrome. 


a 4 Doctors help to restore 


health of Nicholas Volker (pictured) 
after sequencing indicates that his 
inflammatory bowel disease could be 
alleviated by a bone-marrow transplant. 


Vai era Sequencing spares 


a woman with leukaemia from 
undergoing a bone-marrow transplant. 


St) 74) Doctors report using 


whole-genome sequencing to improve 
treatment for a patient with the movement 
disorder dopa-responsive dystonia. 
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diagnosis has yet to be determined. 

A further question is cost effectiveness. 
One speaker at the Marco Island meeting 
is Heidi Rehm, director of the Laboratory 
for Molecular Medicine at the Partners 
HealthCare Center for Personalized Genetic 
Medicine in Cambridge, Massachusetts. Rehm 
says that, on a standard test for known hearing- 
related genes, about 20% of the children who 
come to her clinic with deafness of unknown 
cause test positive for mutations in the gene 
that most often causes hearing loss. About 
the same proportion of the rest test positive 
for mutations in other hearing-related genes. 
Yet the standard tests for these sets of muta- 
tions can cost thousands of dollars. This 
means that, at US$2,500-9,500, sequencing a 
complete genome or the exome costs about the 
same, and so alarger number of patients might 
obtain a diagnosis. 

But interpreting a diagnosis is still no easy 
matter, because much of the variation in the 
genome remains poorly understood. “That's 
the most challenging task we face,’ says Rehm, 
who last December received one of six grants 
awarded by the US National Human Genome 
Research Institute in Bethesda, Maryland, to 
explore how clinical sequencing affects health 
outcomes in a variety of diseases. 

In some instances, the diagnosis can lead 
to a treatment. One of the UCLA patients, for 
instance, was diagnosed with a rare genetic 
disease that is treatable by steroids. But such 
stories are the exception rather than the rule. 
“IT would say a fairly rare percentage of cases 
are going to be like that,’ says Hambuch. More 
often, she says, sequencing reveals a rare and 
untreatable mutation — information that, 
although unwelcome, “can save the family and 
the medical community from doing more test- 
ing that isn’t going anywhere. If you account 
for that, then the impact is pretty big” 

Deciding what information to give doc- 
tors and patients raises its own complex set of 
questions, such as whether children should be 
told of their increased risk of an adult-onset 
disease, and whether doctors who do not 
reveal such information may later be sued for 
malpractice. Different clinics approach these 
questions in different ways. For instance, 
Illumina is planning to allow patients and their 
doctors to choose which test results they want 
to know, and which they do not — for exam- 
ple, whether or not a patient carries the genetic 
mutation that causes Huntington’s disease, an 
incurable neurodegenerative disorder. 

Another area of uncertainty is how regula- 
tors will enter into the equation, which could 
involve oversight of labs, machines, bioinfor- 
matics tools and the format of the delivered 
results. A number of professional groups, 
including the College of American Patholo- 
gists, headquartered in Northfield, Illinois, 
and the ACMG, are working on guidelines 
for the field, and some of these will be rolled 
out this year. = 
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Death-rate row blurs 
mutant flu debate 


Even if a59% mortality rate for H5N1is too high, the virus 
could still cause a flu pandemic more serious than that of 1918. 


BY DECLAN BUTLER 


hen the US National Science 
Advisory Board for Biosecurity 
called for redaction of two papers 


on mutant strains of HSN1 avian influenza 
virus, one reason it cited was the high fatal- 
ity rate of the wild virus’. But some virolo- 
gists claim that the mortality rate — 59% in 
officially confirmed cases (see go.nature. 
com/3ys4py) — has been vastly overestimated. 
Inan opinion article’ published last month, 
Peter Palese and Taia Wang, virologists at Mount 
Sinai School of Medicine in New York, argued 
that the 59% figure 
was “driving this 
controversy’, but 
that it was “likely 
orders of mag- 
nitude too high”. 
Vincent Racan- 
iello, a virologist at 
Columbia Univer- 
sity in New York, 
told Nature that he 
believes any H5N1 
pandemic would 
not come close to 
the 2-2.5% global 
mortality rate of the 
1918 flu pandemic. 
The virologists are reviving an old argument’ 
that the 59% rate, derived from officially 
confirmed World Health Organization 
(WHO) numbers of cases and deaths (see 
‘Mortal danger’), does not take into account 
undetected cases or asymptomatic infections. 
Since the current H5N1 outbreak began 
in 2003, researchers have done several sero- 
prevalence surveys looking for antibodies 
to the virus in people from outbreak areas. 
Racaniello and his colleagues say that some 
survey results imply that H5N1 infections are 
common and the fatality rate is low. But all 
other scientists contacted by Nature disagree. 
Malik Peiris, a flu virologist at the Univer- 
sity of Hong Kong, says that the higher sero- 
prevalence rates seem to 


MORTAL DANGER 


= Non-fatal cases 
= Fatal cases 


be outliers. One study NATURE.COM 
found that 5.6% of peo- _ For more on mutant 
ple tested had antibodies  HON1 flu: 

against H5N1 (ref. 4). _nature.com/mutantflu 


Since 2003, 584 confirmed cases 
of H5N1 flu have led to 345 deaths. 
(Numbers indicate total cases.) 


All of the others found much lower levels, and 
most found no detectable antibody. 

“The consensus is that the rates are quite 
low, though not non-existent,” says Peiris. 

Jeremy Farrar, director of the Oxford 
University Clinical Research Unit in Ho Chi 
Minh City, Vietnam, agrees. “Across south- 
east Asia we have not found much evidence 
of seropositivity,” he says. A review of HSN1 
seroprevalence studies published last year” 
also concluded that “transmission of the H5N1 
virus from poultry to humans is rare”. 

Racaniello says that the studies that found no 
or little seroprevalence “were simply done in 
the wrong popula- 
tions’, and that more 
studies are needed. 

Many scientists 
say that focusing on 
the exact numbers 
misses the wider 
picture — even if 
the fatality rate has 
been overestimated, 
the virus is still a 
severe pandemic 
threat. “I don’t care 
whether this virus 
has a case-fatality 
rate of 50% or 5% or 
1%, it would still be 
a really big problem,” says Marc Lipsitch, an 
epidemiologist at the Harvard School of Public 
Health in Boston, Massachusetts. 

In fact, although the true mortality rate of 
H5N1 is likely to be lower than 59%, the epide- 
miological data suggest that it would dwarf the 
0.1-0.4% rate assumed in many countries’ pan- 
demic preparedness plans, and could far exceed 
that of the 1918 pandemic. Robert Webster, a 
flu virologist at St Jude Children’s Research 
Hospital in Memphis, Tennessee, warns that 
the H5N1 virus is a fearsome pathogen. “You 
walk into a poultry house that has this virus and 
everything is dead,’ he says. “If that sort of virus 
were to get into humans... my God.” m 
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BY JEFF TOLLEFSON 


etal scrapes on hard sand as archaeologist Chris 

Henshilwood shaves away the top layer of sediment in 

Blombos Cave. After just a few moments, the tip of his 
trowel unearths the humerus of a pint-sized tortoise that walked 
the Southern Cape of South Africa many millennia ago. Next 
come shells from local mussels and snails amid blackened soil and 
bits of charred wood, all remnants of an ancient feast. It was one 
of many enjoyed bya distinct group of early humans who visited 
Blombos Cave over the course of thousands of years. 

The Still Bay culture was one of the most advanced Middle Stone 
Age groups in Africa when it emerged some 78,000 years ago ina 
startlingly early flourishing of the human mind. Henshilwood’s 
excavations at Blombos Cave have revealed distinctive tools, 
including carefully worked stone points that probably served as 
knives and spear tips, and bits of rock inscribed with apparently 
symbolic designs. But evidence of the technology disappears 
abruptly in sediment about 71,000 years old, along with all proof 
of human habitation in southern Africa. It would be 7,000 years 
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A South African archaeologist digs into his own past to seek 
connections between climate change and human development. 


Chris Henshilwood 
inspects a cave on the 
South African coast 
near Blombos. 


J. TOLLEFSON 


before a new culture appeared, with a markedly 
different toolkit, including crescent-shaped 
blades probably used as arrowheads. 

What drove the coming and going of these 
early cultures? At about the time the Still Bay 
culture disappeared, the globe — already in the 
middle of a glacial period — began to cool even 
further, causing sea levels to fall (see ‘Crucible 
of culture’). “Humans are very adaptable,’ says 
Henshilwood, “but I think climate must have 
played some role in the demise of the Still Bay.” 

If there is a link, it may hold broader impli- 
cations. Genetic data suggest that the entire 
population of modern humans contracted at 
around the same time, then rebounded and 
expanded in Africa and onto other continents. 

Multiple teams are now racing to determine 
the part climate might have played in driving 
human evolution during this period. Blombos 
Cave, with its detailed archaeological record 
of the Middle Stone Age, could become a key 
testing ground. With Francesco d’Errico, an 
anthropologist at the French National Cen- 
tre for Scientific Research (CNRS) in Bor- 
deaux, Henshilwood has assembled a team of 
archaeologists, climate modellers and palaeo- 
climatologists for a five-year, €2.5-million 
(US$3.3-million) project to look at correlations 
between climate and culture during the eventful 
span of prehistory that includes Still Bay, and the 
beginnings of modern human behaviour. 

“These are very daunting questions indeed, 
but I think they are answerable,” says Henshil- 
wood, a native of Cape Town who now works 
at the University of Bergen in Norway. “If we 
can get some good climatic data, we can at least 
hazard some guesses.” 


PERSONAL HISTORY 

Outside the cave, a cool November breeze 
scours the steep slope to the shore, which 
Henshilwood has known since he was a child. 
His grandfather bought this land on the 
Southern Cape as a fishing retreat in 1961 and 
Henshilwood spent his holidays searching the 
hills and caves for ancient artefacts. 

Those experiences served him well in 1985, 
when, out of sheer boredom in his mid-thir- 
ties, he decided to leave the family department- 
store business and enrol in an archaeology 
course at the University of Cape Town. In 
1991, asa PhD student on a scholarship at the 
University of Cambridge, UK, he returned to 
Blombos in search of the same kind of artefacts 
that he had found asa child. What he discov- 
ered was much more significant and far older: 
a series of bone tools and double-sided stone 
points that were clearly tied to the enigmatic 
Still Bay period. 

“It was right over there,” he says, motion- 
ing to the back of the cave. “Nobody 
believed us, because nobody had found a 
Still Bay site for 40 years.” 

The Middle Stone Age was not part of his 
thesis, so Henshilwood covered the site up and 
moved on. Only in 1997 did he secure funding 
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4 Howiesons 
Poort culture 
flourishes in 
southern Africa. 


CRUCIBLE OF CULTURE eS4<o 


During the latest ice age, human populations 
in southern Africa went through profound 
changes that sometimes coincided with major 
environmental shifts. 


5 Expansion of 
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populations in 

Africa and onto 
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First appearance of anatomically 
modern humans in Africa. 


other continents. 
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pigment. 


1 Residents of Blombos 
Cave processed ochre to 
produce and store the 


2 Still Bay culture 
inhabits Blombos Cave 
and other sites in 
southern Africa. 


100,000 


for a full excavation from the US National Sci- 
ence Foundation. In 2002, Henshilwood pub- 
lished a study' in Science documenting pieces 
of red, iron-rich rock called ochre, which were 
engraved with cross-hatched patterns. He 
argued that the 77,000-year-old etchings were 
examples of symbolic behaviour and repre- 
sented the earliest known evidence of abstract 
thought. These and other findings have chal- 
lenged the once-dominant idea that human 
culture — as exemplified by art such as carv- 
ings and jewellery — appeared in an explosive 
transformation during the Late Stone Age, 
some 40,000-50,000 years ago, in north Africa 


Engraved ochre shows 
clearest evidence of 
early symbolic thought. 


72,000 years ago, his partner and co-excavator, 
Karen van Niekerk, sifts through the bottom 
strata of sediments, which are roughly 100,000 
years old. Centimetres away, the same layer 
yielded Henshilwood’s most recent block- 
buster find”: a toolkit of shells, grindstones and 
crushing stones used to process and store ochre, 
possibly for use as pigment or for utilitarian pur- 
poses such as tanning hides or cleaning wounds. 
It was further evidence that Homo sapiens had 
developed planning skills and sophistication far 
earlier than was once believed. 

Now a postdoctoral researcher at the Uni- 
versity of Bergen, van Niekerk has been work- 


DID CLIMATE WIPE OUT THE STILL BAY CULTURE? 
OR DID THE PEOPLE MOVE AWAY OR PERHAPS 


JUST ADAPT OVER TIME? 


and Europe. Blombos and other sites suggest 
a more gradual cultural and technological 
development, beginning far earlier, during the 
Middle Stone Age throughout Africa. 

On a visit to Blombos in November, the 
cave looks like a war bunker, complete with a 
generator, lights and sandbags. The team has 
excavated just enough earth to create a work- 
space for a crew of five. Hundreds of steel tabs 
mark strata on vertical walls of sediment. While 
Henshilwood works on the cave’ top layer, from 


ing with Henshilwood since the early days at 
Blombos. It’s a good life, she says, “and a lot 
of work”. On this day she finishes early and 
heads to Henshilwood’s beach house and sci- 
entific base to help a master’s student, Cornelia 
Albrektsen, to conduct an experiment using 
home-made stone and bone tools. They strug- 
gle for the better part of an hour trying to repli- 
cate the way ancient people might have opened 
shellfish. Then Henshilwood shows up. 
“Give me one,” he says, grabbing a shell. 


16 FEBRUARY 2012 | VOL 482 | NATURE | 291 


© 2012 Macmillan Publishers Limited. All rights reserved 


| NEWS | FEATURE 


Within minutes, Henshilwood pops open sev- 
eral snails and determines which tools work 
best. He then departs to clean up for dinner, 
leaving the stunned crew to finish the experi- 
ment. “It was really impressive, Albrektsen 
says later. “He was getting all caveman-like” 

During a break in the excavations, Henshil- 
wood stares out to sea and wonders aloud 
whether the Indian Ocean holds answers. 
Palaeoclimate records from marine sediment 
and ice cores suggest’ that around the time the 
Still Bay culture disappeared, global tempera- 
tures dropped and the polar ice sheets grew. 
Ocean levels fell, and the Still Bay people may 
have followed the sea onto the continental shelf, 
which would have become a productive plain. 

If this idea is accurate, most of the evidence 
would have been submerged as the ocean 
returned over the past 15,000 years. Henshil- 
wood has hiked along more than 240 kilome- 
tres of coastline in search of caves that might 
hold clues to the fate of the Still Bay. He hasn't 
found any yet, but he is beginning excavations 
ona site called Klipdrift Shelter, west of Blom- 
bos, that could allow him to look at the rise 
of Still Bay’s successor: the Howiesons Poort 
culture, which appeared 65,000 years ago and 
persisted for about 5,000 years. 


TIME AND TIDE 
Taking a break from Blombos, Henshilwood 
visits the new site with Simon Armitage, a 
mineral-dating specialist at Royal Holloway 
University of London. Armitage uses a tech- 
nique called optically stimulated luminescence 
to determine the last time a sample of dirt 
saw sunlight before being buried. The 
method requires Henshilwood 
and others to cover Armitage 
with a thick black tarpau- 
lin and sit on its edge to 
prevent any light from 
fouling the measure- 
ments. While waiting, 
Henshilwood talks 
about the significance 
of the site, which has 
already yielded a human 
tooth and some artefacts 
with markings that could be 
engravings. He says the find- 
ings may turn out to be more 
fascinating than the decorated 
ochre pieces that made Blombos famous. 

Once the site has been dated, the research- 
ers will add it to environmental and cultural 
records from southern Africa and Europe. To 
construct a climate record, Henshilwood’s team 
is sampling cave deposits, in search of clues to 
ancient rainfall and temperatures. They are 
also testing ocean sediment cores for pollen 
and traces of charcoal that hint at vegetation, 
rainfall and the frequency of fires. 

The palaeoclimate data will allow a team at 
the CNRS to build a high-resolution model 
of climate in Europe and southern Africa, 


Stone tools from Blombos Cave. 


beginning with the time spanning the Still Bay 
and Howiesons Poort cultures. The last step is 
to overlay the climate and cultural data onto an 
ecological model to analyse the environmental 
space occupied by specific cultures throughout 
time. The team can then look for links. Was one 
industry, for example, always associated with 
a particular environment? Do similar cultures 
occupy similar landscapes or respond to climatic 
shifts in similar ways? 

“We can start to test our hypotheses about 
the role of ecology and the environment,’ says 
William Banks, who runs the modelling at the 
CNRS in Bordeaux. 

Henshilwood and his colleagues have some 
friendly competition. Curtis Marean, an archae- 
ologist at Arizona State University in Tempe, 
came to the cape shortly after Henshilwood, 
inspired by the genetic evidence of a popula- 
tion crash in the Middle Stone Age and thinking 
that the cape would have been a good place for 
humans to ride out hard times. He partnered 
with Henshilwood ona paper’ examining bone 
tools from Blombos in 2001 and went on to doc- 
ument the use of pigments’ and heat-treatment 
of stone tools° 164,000 years ago at Pinnacle 
Point, less than 100 kilometres east of Blombos. 

He is also looking to the sea for answers. 
Marean and a team of researchers have already 
produced an assessment’ of historical sea lev- 
els around Pinnacle Point, and now they have 
received money from the National Geographic 
Society in Washington DC and the US National 
Science Foundation to build a detailed geophys- 
ical map of the continental shelf. Marean thinks 
that the exposed shelf would have been a diverse 

shrubland ecosystem with edible roots, 

big game for hunting and marine 
resources. His goal is to recon- 
struct the vegetation, and 
then use models to analyse 
how people might have 
exploited those resources. 
“We need to develop 

a thick empirical record 

and put that into a really 

tight timescale,” says 

Marean. “Once we have 

that, we can start debating 
the whys.” 

Alison Brooks, director of 
the Center for the Advanced 
Study of Hominid Paleo- 
biology at the George Washington University 
in Washington DC, says that Henshilwood and 
others are producing much-needed data and 
hypotheses, but she warns against the dangers 
of oversimplification. Brooks is co-authoring a 
forthcoming publication that aligns palaeocli- 
mate data with archaeological data throughout 
Africa, and she says that 


each region of the conti- NATURE.COM 
nent seems to have has For more pictures 
its own story. “There’s a _ from Blombos and 
lot of complexity here? _ other sites, visit: 
she says. go.nature.com/2kypzy 
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Archaeological excavations at Blombos Cave have 
yielded early evidence of abstract thought. 


Henshilwood acknowledges that compar- 
ing environmental and cultural data may not 
yield concrete answers. The disappearance of 
the Still Bay, he says, could have resulted from 
climatic change, migration, the arrival of new 
people or simply cultural evolution over the 
course of thousands of years. 

Back in the cave, Henshilwood settles down 
into a familiar routine: digging carefully through 
the sediments and thinking about the past. He 
uncovers the remains of a clam that lives along 
sandy beaches and a mussel that prefers rocky 
shores, evidence that the Still Bay people had 
access to a varied coastline much like the one 
he has been exploring all his life. Just behind 
Henshilwood is another hole, carefully filled 
with sandbags. He dug that in 2007 as a test plot 
and found that the sediments inside Blombos 
date back at least 130,000 years, with artefacts 
dispersed throughout. “But that’s for another 
day,’ he says, glancing at the wall of dirt in front 
of him. “Or another year, another decade.” = 
SEE BOOKS & ARTS P.304 


Jeff Tollefson covers energy and environment 
for Nature in New York. 
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Outside 
the fold 


Susan Lindquist has challenged conventional thinking 
on how misfolded proteins drive disease and may power 
evolution. But she still finds that criticism stings. 


BY BIJAL P. TRIVEDI 


na frigid winter’s morning in 1992, Susan Lindquist, 

then a biologist at the University of Chicago in Illinois, 

trudged through the snow to the campus’ intellectual- 

property office to share an unconventional idea for a 

cancer drug. A protein that she had been working on, 
Hsp90, guides misfolded proteins into their proper conformation. 
But it also applies its talents to misfolded mutant proteins in tumour 
cells, activating them and helping cancer to advance. Lindquist sus- 
pected that blocking Hsp90 would thwart the disease. The intellectual- 
property project manager she met with disagreed, calling Lindquist’s 
idea “ridiculous” because it stemmed from experiments in yeast. His 
“sneering tone’, she says, left an indelible mark. “It was actually one of 
the most insulting conversations I’ve had in my professional life.” It led 
her to abandon her cancer research on Hsp90 for a decade. Today, more 
than a dozen drug companies are developing inhibitors of the protein 
as cancer treatments. 

Lindquist seems able to shrug off such injustices, now. Her work over 
the past 20 years has consistently challenged standard thinking on evo- 
lution, inheritance and the humble yeast. She has helped to show how 
misfolded infectious proteins called prions can override the rules of 
inheritance in yeast, and how this can be used to model human disease. 
She has also proposed a mechanism by which organisms can unleash 
hidden variation and evolve by leaps and bounds. She was the first 
female director of the prestigious Whitehead Institute for Biomedical 
Research in Cambridge, Massachusetts, and has received more than a 
dozen awards and honours in the past five years. 


Ina paper being published this week in Nature, NATURE.COM 
she and her colleagues show that in wild yeast, To hear an interview 
prions provide tangible advantages, suchas sur- _ with Susan 
vival in harsh conditions and drug resistance’. Lindquist: 

What is most striking about Lindquist,  go.nature.com/alzlss 
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however, is that despite having the self- 
confidence to take on controversial projects, 
she remains remarkably sensitive to criticism. 
The sting of rejection from the Chicago intel- 
lectual-property office may have dulled, but she 
recognizes and is dismayed by what she sees as a growing incivility © 
among colleagues, a meanness that she thinks threatens the progress ~ 
of science. “I feel like the profession is getting less and less genteel and 
more and more cut-throat,” she says. 


Prion proteins are 
responsible for the 
colour differences in 
some strains of yeast. 


ISHNAN/S. LINDQUIST 


HEATING UP RESEARCH 

Lindquist began her career at Harvard University in Cambridge, 
Massachusetts, in 1971, in the laboratory of Matthew Meselson, a bio- 
chemist famous for helping to show how genetic information is copied 
and inherited. “He was a brilliant scientist” Lindquist says, but when 
she started he was spending much of his time lobbying for a federal 
ban on biological weapons in the United States. “So he was never here.” 

She found the lack of a mentor very stressful in those early days. 
“It was terrifying and I almost left a couple of times,’ she says. Work- 
ing more or less on her own, Lindquist decided to probe a mysterious 
phenomenon that researchers were exploring, called the heat-shock 
response. When fruitfly larvae are exposed to high temperatures, certain 
regions of their chromosomes ‘puff up’ as genes at these sites frenetically 
produce RNA. In work that would culminate in her PhD and eventually 
shape her career, Lindquist showed that applying heat to cultured fruitfly 
cells triggers an emergency response in which the cells manufacture 
heat-shock proteins, such as Hsp90, to protect themselves”. 

When Lindquist published her data, she says, “an awful lot of people 
thought it was nonsense”. Colleagues dismissed the findings as an artefact 
— the result of heat denaturing proteins. Although the work was pub- 
lished in a prestigious journal, Lindquist took the criticisms hard. Her lab 
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Susan Lindquist’s career has been shaped by her investigations of proteins produced in response to ‘heat shock’. 


mate, collaborator and close friend at the time Steven Henikoff — now at 
the Fred Hutchinson Cancer Research Center in Seattle, Washington — 
wondered, “How can such a nice person survive’ in this field? 

With her newly minted PhD, Lindquist started a postdoctoral 
fellowship in 1976 at the University of Chicago. Two years later, the 
university offered her a tenure-track position. Lindquist became inter- 
ested in studying heat-shock proteins in yeast, partly because it would 
allow her to manipulate genes more easily than in flies. A faculty mem- 
ber warned her against changing organisms until she had tenure, but 
Lindquist ignored the advice, assuming that she had little chance of 
getting tenure anyway. “It was really very, very difficult being a woman 
in science then, she says. So she pursued what she found most mysteri- 
ous and fascinating. 

That courage often seems to be lacking in younger scientists these 
days, Lindquist laments. She recalls struggling to convince students or 
postdocs to take on risky projects, only to learn later that when they did, 
their lab mates mocked them. “That shocks me,” says Lindquist. She 
has often been afraid of being wrong — a 
fear that led to lots of repeated experiments 
— but “T didn’t have a fear of a new idea”. 

Most of the new ideas Lindquist has 
developed met with resistance. In late 1993, 
when she proposed that a heat-shock pro- 
tein called Hsp104 could untangle and dis- 
mantle clumps of protein, Nature initially 
rejected her paper. The ideas struck many 
as absurd, Lindquist says. “When I gave a 
talk about it, reactions ranged from scepti- 
cism to outright disbelief” The work was 
eventually published the following year’. 

Still, she was literally staring at her rejected manuscript when she 
received a call from Yury Chernoff, then a postdoc in Susan Liebman’s 
lab at the University of Illinois at Chicago, who had found that Hsp104 
influenced a bizarre colour trait in some yeast strains. Geneticist Brian 
Cox, then at the University of Liverpool, UK, first described this trait’, 
called [PSI'], in yeast in 1965. Cox noted that when white yeast strains 
mate with red ones their progeny produce only white offspring, rather 
than the mixture of red and white predicted by conventional genetic 


“ET FEEL LIKE THE 
PROFESSION IS 
GETTING LESS AND 
LESS GENTEEL.” 


theory. According to one hypothesis, the trait was actually passed 
on not by genes but by a misfolded protein that worked like the self- 
replicating, disease-causing prions known to trigger fatal neurological 
disorders such as Creutzfeldt—Jakob disease. 

Prions join together to form long, ‘amyloid’ fibres. Working with 
Chernoff, Lindquist showed how Hsp104 controls the [PSI+] trait by 
chopping up fibres ofa protein called Sup35 (ref. 5). Short segments 
of these Sup35 fibres are passed to daughter cells and act as a template 
for more to form. Watching the yeast prions pass from mother cell to 
daughter cell was “pretty magical’, Lindquist says. Moreover, the results 
suggested that simple yeast cells could be used to study the proteins 
that cause neurodegenerative disorders in humans — another idea that 
colleagues found hard to swallow. 

For the next 15 years, Lindquist expanded her study of yeast prions. 
Chernoff, now editor-in-chief of the journal Prion and based at the 
Georgia Institute of Technology in Atlanta, says that Lindquist pio- 
neered many of the biochemical and molecular techniques now 
used for studying yeast prions. But her 
controversial hypotheses, he says, have 
really driven the field forward and pro- 
voked discussion and new experiments. 
Lindquist suggested that yeast prions 
are widespread and may be beneficial in 
some cases because they are able to switch 
between soluble, active states and fibrous, 
inactive states®. 

Many have suggested that the prions 
she has been observing are artefacts of 
laboratory culture techniques that force 
proteins to behave in unnatural ways. But 
in her most recent paper’, Lindquist has shown that about one-third 
of the 700 or so wild yeast strains she examined harboured prions. In 
almost half of those strains, the prion seems to confer a beneficial trait. 
For example, a strain isolated from white wine is resistant to acidic 
environments and to the anti-fungal drug fluconazole; and a strain 
harvested from Lambrusco grapes is resistant to a DNA-damaging 
agent. When the prions in these strains are eradicated or ‘cured; these 
useful traits disappear. 
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Lindquist has also continued her studies of Hsp90. When, in the 
1990s, she disabled, or knocked out, both copies of the gene that makes 
Hsp90 in fruitflies, the creatures died; but when she knocked out just 
one copy, something mysterious happened. Flies were born with a 
hodgepodge of physical deformities, such as shrunken or square eyes, 
shrivelled wings and crooked legs’. 

Lindquist realized that Hsp90 was chaperoning proteins that contain 
detrimental mutations into a working form, thereby hiding their effects. 
Removing half the Hsp90 meant there wasn't enough of it to go around, 
proteins could no longer fold correctly, and the effects of all the hid- 
den mutations became apparent. Lindquist hypothesized that the same 
thing happens during a natural crisis such as starvation or a change in 
temperature or pH. The environmental shock makes more proteins 
misfold and these suck up the available 
Hsp90, leaving a surplus of incorrectly 
folded proteins that could spawn the 
evolution of new traits. Most of this mis- 
folding will be bad, says Lindquist. But if 
any of it yields a cell that is well adapted 
to the new conditions, some organisms 
could survive and thrive. 

Lindquist calls Hsp90 a “capacitor” 
for evolutionary change. Just as an elec- 
trical capacitor stores electrical energy, 
Hsp90 lets hidden variation build up in 
the genome. When an environmental 
stressor trips the switch, dramatic vari- 
ations can be unleashed. She found the 
same kinds of effects in the plant Arabi- 
dopsis thaliana — upturned and extra 
roots, exotic leaf whorls and darker hues 
appeared when the heat-shock protein 
system was put under stress*. Lindquist 
suggests that studying this phenom- 
enon would be a powerful approach for 
discovering hidden variation in plants 
— unlocking the basis of traits such as 
drought resistance or salt tolerance. 

Lindquist says she was unaware that 
these ideas would upset people. Many 
in the evolutionary-biology community 
adhere to the idea that evolution pro- 
ceeds in slow, tiny steps, not the big bursts she was proposing. Nick 
Barton, an evolutionary geneticist at the University of Edinburgh, UK, 
says that the suggestion that the chaperone system releases “useful” 
variation when needed is controversial. “I really don't think there is 
much evidence for an adaptive role,” he says. 

Others are more open to the hypothesis. This mechanism should be 
incorporated into evolutionary theory, says Massimo Pigliucci, an evo- 
lutionary biologist and philosopher at the City University of New York 
Graduate Center. Pigliucci says that Lindquist “put empirical meat on 
ideas that have been around for a while” Still, he asks, “How important 
are these in the evolution of lineages?” It may take another 20 years to 
work that out, he says. 

In August 2001, Lindquist moved from the University of Chicago 
to take the helm of the Whitehead Institute. It was an honour, but also 
a draining position that she held for only three years. She oversaw the 
separation of Whitehead from its genome centre, a sequencing power- 
house that had contributed much of the data for the Human Genome 
Project. It was a financially messy ordeal that left her desperate to focus 
on science, and particularly on disease-related research. 

Even though she hasn't been the one to develop them, Hsp90 
inhibitors have already begun to show some promise. More than 
20 clinical trials are exploring their effect in cancer. “It’s a hot topic,’ 
says Len Neckers, a cancer biologist at the National Cancer Institute in 
Rockville, Maryland, who identified the first Hsp90 inhibitor 20 years 
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The mutations responsible for these flies’ deformities are 
present in normal-looking flies, but their effects are usually 
hidden by ‘chaperone’ proteins. 


ago. The inhibitors might also target drug-resistant fungi that cause 
deadly infections in people with suppressed immune systems’. 

Lindquist’s expertise in protein folding fuelled an interest in neuro- 
degenerative diseases. Amyloid fibres are also present in Alzheimer’s, 
Parkinson’s and Huntington's diseases, and Lindquist has championed 
yeast as a model for studying their effects in these conditions. In a 
study published last year”, she showed that a pile-up of the amyloid- 
protein, a hallmark of Alzheimer’s, is toxic to yeast, slowing its growth. 
She then used the model to screen 5,000 yeast genes for ones that might 
affect this toxicity. The approach was successful: it turned up 40 genes, 
12 of which had human homologues, and one of which is a known risk 
factor for Alzheimer’s. Two others interact with known risk factors. 

Her hope is to pin down in yeast the initial steps that lead to amyloid 
formation in Alzheimer’s, then to iden- 
tify drugs that prevent it. The approach 
continues to raise eyebrows, however. 
“Many wondered how she could possibly 
model things such as Alzheimer’s and 
Parkinson's in yeast — which are a single 
cell, have a short life span and, of course, 
dont have a brain,” says Nancy Bonini, a 
neurogeneticist at the University of Penn- 
sylvania in Philadelphia. 

Her grant applications have received 
“very mixed reviews’, says Lindquist — a 
charitable description, she adds. Many 
hardworking scientists with great ideas 
get their proposals turned down, she says, 
but she worries that the tough funding cli- 
mate is dragging down the tone of grant 
and paper reviews. “They get exhausted, 
tired, and they get cranky. And then they 
get a paper to review.’ She pauses, leans 
forward and says emphatically, “I think 
we have to stop and say, ‘No, let’s not do 
this. Let's not be mean to somebody else 
because someone was mean to you.” 

Meselson, she says, instilled in her the 
importance of ethical and compassion- 
ate scientific conduct. It is something she 
has worked hard to emulate and pass on 
to her own trainees. In late-2010, she 
wrote a short commentary’' entitled “Three quite different things that 
matter to me’ Think and train broadly, she wrote; be kind, be generous, 
don't try to destroy someone; and, have faith. 

Her work and the testaments of colleagues speak to her success with 
the first two, and her own words attest to the last: “When I think about 
my kids’ future I feel very concerned,’ she says, tearing up as she lists 
the world’s environmental, social, economic and political woes. “And 
then I go to a lecture. ll hear someone get up and talk about their work, 
and they've done something amazing. The profession that I live and 
breathe gives me hope.” 


Bijal P. Trivedi is a freelance writer in Washington DC. 
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In many indigenous cultures, including that of the Coeur d’Alene tribe, dance is used to transmit stories and teachings on to younger generations. 


Adapted to culture 


Mark Pagel proposes that our ability to share and build 


hat made us human? I propose 
that the development of a unique 
capacity for culture around 


200,000 years ago was the defining event 
in the evolution of modern humans. A fast- 
paced evolutionary process emerged, which 
by 60,000 years ago had propelled modern 
humans out of Africa in small tribal societies 
to occupy and re-shape the world in just a 
few tens of thousands of years. 

Culture became our strategy for survival. 
Our ability to learn from others and to trans- 
mit and build on knowledge, technology 
and skills might be the most potent trait the 
world has seen for converting new lands and 
resources into more humans. Whereas other 


on ideas is what made us human. 


species are confined to the environments their 
genes have adapted them to, we have adapted 
to nearly every environment on Earth. 


A capacity for culture makes humans 
unique 

Transmitting technology and skills is 
our strategy for survival 

We became ultra-social through 
visual theft, the stealing of others’ ideas 

Language evolved from a need to 
negotiate 

Evolution has honed the range of our 
talents 


Humans today, I suggest, are the 
descendants of those who were best at using 
this social juggernaut to advance their inter- 
ests. The defining features of our nature 
— our ultra-sociality and language plus 
various innate talents and skills — arose as 
adaptations to living in the prosperous social 
environment of human culture, not from our 
shared history with other animals. 

Our capacity for culture rests on two 
building blocks that together create an 
unbridgeable gap in evolutionary poten- 
tial between us and all other species: social 
learning and ‘theory of mind. Through 
social learning, we can copy new behav- 
iours merely by observing others. And 
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> with our theory of mind we can attribute 
mental states to others, allowing us to guess 
or understand their motives. We can then 
choose to copy the actions, ideas or inven- 
tions that have the best outcomes. 

Both characteristics may be unique to 
humans. What looks like social learning 
in other animals could be little more than 
socially influenced learning that makes use 
of behaviours already present in an animal's 
repertoire’. For instance, chimpanzees 
manipulate things with their hands, so when 
one chimp uses a rock to crack open nuts, or 
a stick to fish for termites, another might be 
drawn to playing with rocks or sticks. That 
might by chance lead to a nut being cracked 
open or a termite retrieved. The reward 
would then reinforce the behaviour even 
though there was no direct imitation. 

Some birds modify their behaviour when 
they know they are being observed by oth- 
ers of their species — as if ‘aware’ that the 
observer might use the knowledge it gains 
to its advantage. Thus, when a nutcracker 
bird sees another bird watching it while it 
hides its food, it will return alone later to 
hide the food in a new spot. This behaviour, 
also seen in other corvids, is intriguing, but 
it may just be a predisposition to respond 
to a learned behaviour; there is no good 
evidence beyond humans for a theory of 
mind”. In fact, most human two-year-olds 
show a greater understanding of others’ 
beliefs than even adult apes do. 

The upshot is that although some animals 
seem to have what we might call cultural 
‘traditions’ — birds pecking at milk-bottle 
tops to get cream, for instance, or chimpan- 
zees cracking open nuts with rocks — these 
habits do not evolve or improve over time’. 
Even after a million years, they will still be 
using the same techniques — unless they 
acquire true social learning and a theory 
of mind*. 

By comparison, human societies evolve 
steadily through cumulative cultural adap- 
tation. Our knowledge, skills and technolo- 
gies accumulate improvements and produce 
variety, as people imitate each other, choose 
from and modify existing forms and com- 
bine objects to make new ones*— when a 
shaped club was combined with a hand-axe, 
for example, the first hafted axe was born. 
The result is complex and varied culture 
that resembles animal cultural traditions 
about as much asa Bach cantata resembles 
a gorilla beating on its chest. 


VISUAL THEFT 

This capacity for improvement demanded 
changes that are not observed anywhere else 
in nature. Altruism is one example. Humans 
cooperate with unrelated individuals and 
perform acts of generosity that might not 
be repaid. We trade and exchange things, 
but we also hold doors open for people, give 


up seats on trains, contribute to charities 
and risk our lives by pulling someone from 
a burning building or fighting in a war. We 
are oddly group-focused: happy to wear 
silly matching shirts to sporting events, or 
paint our faces in the colours of our national 
flag, and keenly affected by the loss of our 
soldiers in battle. Who can forget images of 
Japan's fabled kamikaze warriors? You do 

not see this in chimpanzees. 
In the rest of the animal kingdom, 
cooperation is generally confined to help- 
ing relatives. The the- 


“Our species’s °TY of kin selection 
history is the explains why: actions 
progressive that support your 
triumph of relatives benefit cop- 
cooperation ies of your genes. But 


this theory is mute in 
the face of the human 
propensity to help 
strangers. We should therefore consider 
humans as ‘ultra-social; having broken free 
of the usual genetic constraints on altruism. 

Why do we behave in these ways? I 
suggest that around 160,000-200,000 years 
ago our capacity for culture created a social 
crisis to which ultra-sociality was the evolu- 
tionary solution. That crisis was visual theft* 
— the capacity to steal others’ ideas. 

Because we can learn simply by watching 
others, knowledge is available to everyone 
and cultures can evolve and adapt at great 
speed. But if I watch which lure you use to 
catch a fish or how you haft a hand-axe, I 
benefit from your ingenuity as much as you 
do, possibly even more, because you had to 
spend time tinkering before you arrived at 
the solution I am now copying. I might even 
catch that fish before you do. 

Thus, once a species acquires social 
learning, it becomes advantageous to keep 
the best ideas secret, lest they be stolen. This 


over conflict.” 


is illustrated today in our reluctance to share 
ideas — whether they be old family recipes, 
knowledge of fishing lures or new scientific 
or business plans — and also in our many 
patents and copyrights. 

But hiding the best innovations would 
have brought cumulative cultural adapta- 
tion to a halt and caused our fledgling socie- 
ties to collapse under the weight of suspicion 
and rancour. To avoid this outcome, we had 
to evolve the social rules and psychology 
that make it possible for people to exchange 
their ideas, knowledge and technology with- 
out undue fear of being exploited. 

A great emphasis was then placed on 
demonstrating your own — and gauging 
others’ — worthiness, because knowledge 
and technology were now held collectively 
by the social group, which wouldn't want to 
share them with cheats or competitors. 

The many peculiar acts of altruism that 
describe our ultra-social nature evolved as 
costly ways for us to demonstrate our com- 
mitment, and thus our worthiness, to our 
cooperative group. The clearest way to show 
others that you are an altruist is to behave 
altruistically. The good reputations we earn 
attract altruism from others, which in turn 
grants us access to the material and social 
rewards of our communities. 

We take our ultra-sociality for granted, 
but once such a system got going we had no 
choice but to become altruism ‘show-offs; 
to compete with others for a slice of the 
cooperative pie. Our ultra-helpful nature 
is the altruism equivalent of a peacock’s 
tail, except that the peacock uses his tail 
to attract a mate — we use our altruism to 
secure the spoils of cooperation. 

Other unique features of our psychology, 
including our norms and morality, our 
expectation of fairness and our ten- 
dency towards ‘moralistic aggression 


Humans are uniquely group-focused: many will deck themselves out in their team’s colours. 
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— punishing people who violate social 
rules — are emotions and social mecha- 
nisms we evolved to police those who might 
be tempted to exploit this fragile coopera- 
tive system. 


SOMETHING TO TALK ABOUT 

Human language differs from the grunts, 
chirrups, roars, odours, chest thumping 
and colourful displays of the rest of the 
animal kingdom in that it is compositional. 
We speak in sentences made from discrete 
sounds — words — that take the role of 
subjects, objects and verbs. Some animals 
make noun-like sounds — vervet monkeys 
can signal the approach of ground-based 
versus aerial predators — but only humans 
have been proven to use sentences. 

Why? A number of ancient features of our 
anatomy and behaviour, such as our finely 
coordinated facial muscles or our primate 
tendencies to gesture, might have contrib- 
uted to elements of our language’. But they 
do not explain why it evolved. I suggest that 
the complicated forms of cooperation and 
exchange we evolved to defuse the crisis of 
visual theft demanded a social technology 
for handling our deals, for coordinating our 
activities, for negotiating agreements and 
for broadcasting our reputations’. Language 
is that piece of social technology. 

We acquired language because we were 
the only species with enough to talk about 
to pay for this expensive apparatus and 
the time and energy it takes to learn to use 
it. Lacking our social complexity, other 
animals don’t need language, but human 
societies probably could not exist without it. 

Even the simplest acts of exchange 
depend on language. Imagine you are good 
at making bows and I am good at making 
arrows, but our species has no language. 
I give you some arrows hoping you will 
give me bows in return. But you smile and, 
thinking my arrows are a gift, take them and 
walk off. I chase you, a scuffle ensues and 
I get stabbed with one of my own arrows. 
Now replay that scene with both actors hav- 
ing language: a cooperative and peaceful 
deal can be reached. 

Research shows that the Neanderthals 
had the same version of a segment of DNA, 
known as FOXP2, that we do, and that has 
been implicated in the fine motor move- 
ments we use for speaking, leading many 
to suggest that Neanderthals, too, had 
language. Yet little in the archaeological 
record points to cumulative cultural adap- 
tation in the Neanderthals’ — no musi- 
cal instruments, no art, no fish-hooks or 
spear throwers. They did not even sew 
clothes. From the rule of visual theft, I sug- 
gest this dearth of culture tells us that the 
Neanderthals did not have language. Their 
human-like FOXP2 might have given them 
better communication abilities than other 


Our ability to build on and modify inventions 
has given us a selective advantage. 


mammals, but in explaining the appearance 
of language we must look for the need for 
it, not just for pieces of anatomy or genes 
— some birds, for example, can mimic 
human speech, but do not share our ver- 
sion of FOXP2. 


DOMESTICATED BY CULTURE 

Humans have a surprisingly large range 
of abilities. Some of us are good at music, 
others at mathematics, design, language or 
sport, and all of these have been shown to 
havea significant genetic component*. Now, 
natural selection is the process by which 
some genetic varieties survive at the expense 
of others. It favours melodic singers among 
songbirds, and fast runners among lions and 
their antelope prey — poor singers remain 
lovelorn (and childless) and slow runners 
hungry or dead. We might therefore expect 
differences among us to get erased by natu- 
ral selection. How, then, can we explain the 
diversity of human skills? 

I believe that this variety is yet another 
consequence of our capacity for culture. 
Once our cooperative systems made it pos- 
sible for people to exchange skills, goods 
and services, those who specialized at what 
they did best would have had the most to 
trade with others. In no other species is this 
possible, because no other species practises 
such a division of labour among unrelated 
individuals. 

Our cultures domesticated and sorted 
us by our various talents, encouraging 
the skills to co-exist*. It is a scenario we 
should recognize, having inflicted it onto 
countless domesticated animals, notably 
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dogs. Breeds ranging from chihuahuas to 
Newfoundlands bear the genetic marks of 
having evolved specialized temperaments, 
skills and morphologies in response to the 
social environment of human whims. 

Our genes might have been equally 
content to specialize to the opportunities 
our societies created, and if so, this could 
have implications that are relevant for con- 
temporary society. Most of us support the 
societal goal of ensuring equality of oppor- 
tunity. But if people have different innate 
skills, then such a policy could produce a 
‘genetic meritocracy, a society differentiated 
by innate predispositions. 


THE MODERN WORLD 
There is evidence of an upturn beginning 
around 40,000 years ago in the degree of 
positive selection acting on our genes’, 
and involving hundreds of them™. It may 
not be an accident that this coincided with 
a flourishing of human culture as seen in 
an explosion of artefacts, art and musical 
instruments, and in our occupation of the 
world. These fast-evolving genes consti- 
tute our wiring for culture, and they can be 
identified using the same methods that iso- 
late the genes that cause medical problems. 
Modern societies differ vastly from the 
small tribes that once competed to occupy 
Earth. But the old psychology plays out well 
in our globalized multicultural world. Our 
species’s history is the progressive triumph 
of cooperation over conflict as people recog- 
nized that cooperation could return greater 
rewards than endless cycles of betrayal 
and revenge. In a diverse world, the key 
to promoting this cooperation is to create 
among people a greater sense of trust and 
shared values that goes beyond the highly 
imprecise markers of ethnic or cultural dif- 
ferences. This is the social glue that has fos- 
tered our ultra-sociality and can continue to 
do so. m SEEBOOKS & ARTS P.304 
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Scientists are generating a wealth of human-DNA data, but no system exists for informing volunteers of their results. 


Bring clinical standards to 
human-genetics research 


Study protocols need to be rigorous, because more than science is at stake. 
Sometimes participants’ lives depend on the results, writes Gholson J. Lyon. 


Ogden, Utah, in which three boys 
over two generations had died from 
an unknown disease with a distinct com- 
bination of symptoms, including an aged 
appearance, facial abnormalities and 
developmental delay. At the time, a fourth 
boy was affected; he died a few months later. 
Like any researcher in human genetics and 
biomedicine, I wanted to identify the genes 
behind this disease. As a medical doctor, I 
heard their tragic story and drew blood from 
several family members in their home. Using 
those samples, my colleagues and I identified 
the genetic basis of this disease’, which we 
named Ogden Syndrome after the town in 
which the family lives. 

Then, in November 2010, another family 
member told me that she was four months 
pregnant — and she was having a boy. 

She was, understandably, very worried that 
she might be a carrier of the mutation — two 
of her sisters had already lost one boy each to 
this heartbreaking disease. My colleagues and 
Ihad sequenced her DNA for our research, 
and the data suggested that she was a carrier, 
implying a 50% chance that her son would be 


I November 2009, I met a family in 


born with Ogden Syndrome. But when she 
asked me what I knew, I hesitated. 

I was not her physician; I was a researcher, 
and I had done this work on a research basis, 
not following the specific protocol required 
for performing validated clinical or diagnostic 
tests. I couldn't be totally sure that her individ- 
ual results were accurate. Should I share them 
with her anyway, knowing the devastation 
they could cause? What if was wrong, and 
she terminated the pregnancy? 

Now is perhaps one of the most exciting 
periods in human genetics and medicine — 
it is possible to sequence most of an entire 
human genome for less than the cost of many 
tests and procedures done routinely in clinical 
medicine, including magnetic resonance 
imaging scans and many types of surgery. 

But this rapid expansion is shining a 
spotlight on the problems with how that 
information is handled and processed. 
Specifically, researchers are largely unable 
to share their findings 


with the people who DNATURE.COM 
make that research Read more about 
possible: study partici- Ogden Syndrome: 
pants. Atthe moment, — go.latuire.com/rhitjz 
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human-genetics researchers operate in a 
totally unregulated environment, following 
their own protocols to obtain, store, track and 
analyse DNA — creating many opportunities 
for error. Researchers take shortcuts to save 
time and money, given that most never expect 
(as I did not) that their results might have a 
direct effect on the life of another human 
being. But when the result can mean the dif- 
ference between life and death, mistakes are 
not an option. 

I suggest that we change the way we collect 
and process samples for human-genetics 
research. We should create a formalized pro- 
tocol akin to the rigorous process that doctors 
and other health-care workers go through 
during any clinical lab test, which practically 
eliminates the chances of mistakes and mix- 
ups. In this way, when participants want to 
know what we know, we will feel confident 
that what we tell them is correct. 

In 2009, after finishing my clinical train- 
ing, I moved to one of the best places in the 
United States for the genetic study of large 
pedigrees: Utah. I began to collect DNA 
samples from families with neuropsychiatric 
disorders, including individuals with severe 
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developmental delay, mental retardation 
and autism. I also began to understand the 
problems with how human-genetics research 
is conducted. 


WHEN THE UNEXPECTED OCCURS 

Towards the end of my first year in Utah, 
I began sequencing DNA from a family in 
which a father and two sons were affected 
by severe attention-deficit and hyperactivity 
disorder (ADHD). Before [had finished my 
analysis” of the sequencing data, one of the 
sons revealed to me that he had a severe case 
of anaemia. Even though I was searching 
for the genetic cause of ADHD, as a physi- 
cian anda human being, I felt an ethical and 
moral obligation to try to figure out whether 
he had any mutations that could have led to 
his anaemia. It turns out that he did. 

But I was not able to return any results 
to him, because this research was not per- 
formed ina clinical environment. Wouldn't 
it help him to know that the jaundice and 
other problems he had battled for the past 
20-plus years of his life were caused by two 
rare recessive mutations? Most importantly, 
as he moved forward in his life, wouldn't this 
information help him to decide with a future 
partner whether to undergo genetic counsel- 
ling, and perhaps even genetic testing, before 
conceiving any children? 

In the United States, all clinical laboratory 
testing performed on humans is regulated 
by the US Centers for Medicare & Medicaid 
Services in Baltimore, Maryland, through 
the Clinical Laboratory Improvement 
Amendments (CLIA). 

When a clinician orders a blood test for 
anaemia, that blood is drawn by a licensed 
phlebotomist in an accredited laboratory 
setting, and the sample tube is barcoded 
immediately, thus reducing to about zero 
the chances of mix-up. The blood sample is 
then processed in an accredited laboratory 
with reagents that are carefully documented 
and maintained, so that haemoglobin and 
haematocrit are assessed and calculated in 
the same way for that sample as for all other 
samples in that laboratory, each and every 
time. 

Even companies that perform direct-to- 
consumer genetic testing, such as 23andMe 
based in Mountain View, California, track 
the saliva samples quite carefully from the 
moment the tube is closed, so that the results 
can be returned to the consumer. 

Now, how do most scientists in the United 
States conduct human-genetics research? 
Not in the manner described above, and not 
under regulation by CLIA. Instead, blood 
is drawn by just about anyone who is able, 
and there is certainly no “treating physician” 
ordering the blood draw (that is, someone to 
be held medically and legally responsible if 
something goes wrong or is missed). 

Sometimes the sample tubes have barcodes; 


sometimes they have only hand-written 
labels. Often, the researchers themselves 
extract the DNA, using standard reagents 
ora ‘kit’ available from many different com- 
panies, but there is rarely any tracking of the 
reagents used. DNA is sometimes extracted 
at a core facility using one of any number of 
methods, and the transferral of the samples 
to the core facility requires that the tubes are 
passed from researcher to researcher, increas- 
ing the chances of human error. 

There is also extreme variability in how 
DNA samples are used, managed and stored. 
Some researchers might handle the same 
samples again and again, thus increasing 
chances for mix-ups or cross-contamina- 
tion. Indeed, authors of human-genetics 
papers commonly eliminate samples that 
they suspect were mixed up. Some research- 
ers store samples in a centralized biobank, 
but others use any freezer in the lab, creating 
many opportunities for error. There are no 
mandated guidelines for handling human- 
DNA samples in a research setting, precisely 
because such research is not regulated. 

I never expected that a research subject 
would tell me that he probably had a genetic 
condition besides the one I was studying at 
the time. Some people might argue that I 
shouldn't have looked for the cause of the 
anaemia, but to me, it seems ethically and 
morally wrong not to try. 

In the case of the Ogden family and the 
family with ADHD, I labelled samples by 
hand and gave them to a core facility at the 
University of Utah in Salt Lake City for DNA 
extraction. Although I was confident that I 
had performed each step as rigorously as 

possible, none of the 


“Most reagents were tracked 
researchers in any way similar to 
never expect a clinical lab test. As 
that their such, the results did 
results mi ight not meet the very high 
have adirect standards required of 
‘i clinical tests. 
rth cpa I have since asked 


the physician of the 
man with ADHD to 
follow up my results 
with a CLIA-certified genetic test to confirm 
that the man carries the anaemia mutations, 
so that this clinician can release the informa- 
tion to him. Even now, many months later, 
the testing has not been performed, because 
most clinicians face roadblocks such as find- 
ing an available gene diagnostic test, dealing 
with insurance and arranging for appropriate 
genetic counselling. 

In the case of the woman who I suspected 
was carrying a male fetus with Ogden 
Syndrome, I faced a major dilemma owing 
to time constraints. Given that my results 
could have been incorrect, and might have 
caused undue stress and possibly even an 
unnecessary termination of the pregnancy, 


human being.” 


I chose to “first, do no harm” — I did not 
return my research result to her, and I 
instead attempted to validate it in a CLIA- 
certified lab at a major diagnostic facility. 
It was a long and bureaucratic process, but 
after several months, in July 2011, we had 
a formal, CLIA-certified genetic test for the 
specific mutation in NAA 10, the gene associ- 
ated with Ogden Syndrome. 

Unfortunately, by that time, the woman had 
given birth to her son. As I had feared, he had 
the disease. Sadly, he died in June 2011, four 
days before the paper describing the mutation 
that killed him was published. 


EXPECT THE UNEXPECTED 

There are increasingly limited resources for 
biomedical research, and it can take 20 years 
or more to translate genetic discoveries into 
new drugs or other treatments. So why not 
help the families and research participants 
now, by deriving the highest possible value 
from every DNA sample we sequence? 

Participants want to be involved in the 
research process and be told about any 
medically important findings. I am there- 
fore suggesting that the entire process of 
DNA collection and genome sequencing for 
humans could and should be performed in 
a proper clinical environment, so that phy- 
sicians can immediately return all relevant 
genomic information much more easily, 
and perhaps even link such information to 
medical records so that it is available for re- 
analysis as our knowledge expands. 

This means establishing suitable guide- 
lines for the collection, tracking and 
sequencing of DNA samples from human 
participants, along with training health-care 
professionals in genetics counselling. 

To make these changes possible, grant 
agencies should consider setting aside 
funding to establish clinically certified pro- 
tocols for handling human genomic data, 
including findings perhaps unrelated to 
the original research goals’. After all, these 
agencies are supported by taxpayers, and the 
data ought to be given back to those donat- 
ing their time and DNA to the research. 
We cannot forget the wise words of the late 
geneticist Charles Epstein, from his 2001 
William Allan award lecture: “the operative 
word in ‘human genetics’ is ‘human: Human 
genetics is about human beings — about 
humanity and humaneness.”’ m 


Gholson J. Lyon is in the Institute for 
Genomic Medicine, Utah Foundation for 
Biomedical Research, Salt Lake City, Utah 
84106, USA. 

e-mail: gholsonjlyon@gmail.com 


1. Rope, A. F. et al. Am. J. Hum. Genet. 89, 28-43 
(2011). 
2. Lyon, G. J. et al. Discov. Med. 12, 41-55 (2011). 
3. Epstein, C. J. Am. J. Hum. Genet. 70, 300-313 
(2002). 


16 FEBRUARY 2012 | VOL 482 | NATURE | 301 


© 2012 Macmillan Publishers Limited. All rights reserved 


Regulate alcohol for global health 


The World Health Organization is the only body that can promote health through the 
use of international law. It should make alcohol its next target, says Devi Sridhar. 


nlike any other global-health body, 
| the World Health Organization 
(WHO) can create legally binding 
conventions, and it only requires a two- 
thirds majority vote to do so. Yet this power 
is vastly underused. In more than 60 years, 
this United Nations agency has produced 
only two major treaties: the International 
Health Regulations, which require coun- 
tries to report certain disease outbreaks and 
public-health events; and the Framework 
Convention on Tobacco Control, which 
commits governments to making legislative 
moves to reduce the demand for, and sup- 
ply of, tobacco. The WHO has shown 
a reluctance to use hard legal instru- 
ments. Instead, it tries to influence 
societal norms through guidelines and 
recommendations’. This is a major 
missed opportunity. 

Nowis the time for the WHO to take 
a bold step and move towards a third 
treaty to protect world health. There 
is an obvious target. About 2.5 million 
deaths a year, almost 4% of all deaths 
worldwide, are attributed to alcohol 
— more than the number of deaths 
caused by HIV/AIDS, tuberculosis 
or malaria”. Alcohol consumption is 
the world’s third-largest risk factor 
for health burden; in middle-income 
countries, which constitute almost 
half of the world’s population, it is the 
greatest risk (see “Health burdens’). 

There are some good, evidence- 
based efforts for alcohol control 
already in place, such as the 2010 
WHO Global Strategy to Reduce Harmful 
Use of Alcohol. This document lays out 
ten areas in which action can be taken, 
from raising awareness to preventing 
drink-driving and restricting the avail- 
ability, marketing and pricing of alcohol. 
Its recommended policy interventions are 
general and sensible, including: banning 
unlimited drinks specials; enforcing a rea- 
sonable minimum age limit for purchase; 
and enacting graduated licensing for 
novice drivers with zero-tolerance for drink- 
driving. The strategy helpfully summarizes 
the cost-effectiveness of various strategies. 
But this is a portfolio of useful information 
and policy tips, not a binding document. 

A WHO Framework Convention on 
Alcohol Control could and should turn 
those recommendations into legal require- 
ments for member states. What difference 
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would this make? Formally, countries would 
commit to applying the agreement through 
national legislation — which would require 
a ream of new policies for nations such 
as India where current regulation isn’t so 
comprehensive. Nations would be required 
to report to the WHO on their progress. 
The international community would have 
a shared responsibility to support these 
efforts by providing financial and technical 
assistance as needed. Informally, ministries 
of health would have a stronger domestic 
negotiating position in prioritizing alco- 
hol regulation above economic concerns. 


HEALTH BURDENS 


Alcohol is the third-largest risk factor for loss of years to disease and 
disability. The effect is largest in middle-income countries (2004 data). 
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Non-governmental organizations would 
be able to pressure governments, and even 
bring issues to court. 

The creation of a framework convention 
requires much political work and prepa- 
ration. The WHO secretariat should, for 
example, map out the positions of countries 
on alcohol use, their links to industry, and 
how best to overcome opposition in each 
nation. Doing so will require donor funding 
for a special cabinet project, as was done for 
tobacco. The overarching goal would be to 
assemble a ‘coalition of the willing and able’ 
to push this agenda forward in the World 
Health Assembly — the WHO's decision- 
making body. 
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domestic public health. Despite a binding 
Framework Convention on Tobacco Con- 
trol, tobacco use is increasing in many poor 
countries, and is still the second-largest 
cause of disease risk in middle-income 
countries. The problem is that oversight is 
minimal and no strong enforcement mecha- 
nisms exist, so compliance is weak. 


STRENGTHENED POWERS 

To help overcome such problems, the WHO 
should endorse a commission on global 
health law, headed by an independent 
expert. Through analysis of other regimes, 
such as those of trade and finance, that 
have arguably been more successful in 
utilizing international law, this com- 
mission could provide recommend- 
ations on how to strengthen the 
WHOs normative power. 

The WHO’ legal potential should 
not be focused solely on individual 
health hazards such as alcohol and 
tobacco, it should be used to create 
a broad framework convention on 
global health’. This would identify a 
basic package of health services that 
governments ought to provide; iden- 
tify who would be obliged to provide 
what; and examine how this could be 
achieved through reform of global 
health governance. 

i To flourish in an environment 
6 with numerous other better-financed 

and more-inclusive institutions, the 

WHO must take a hard look at itself 

and what makes it special. Other 
bodies can provide technical advice, give 
money, influence domestic health policy, 
assist in development and advocate for the 
importance of health in government policy. 
The WHO is the only body with the legiti- 
macy and authority to proactively promote 
health through the use of international law. 
It needs to do so. m 
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With the advent of ancient cave art, human cultural innovation burst into being. 


Custom built 


Culture is both a product and a driver of human 
evolution, finds Peter Richerson. 


hen Charles Darwin discussed 

the forces driving human evolu- 

tion in his 1871 book The Descent 
of Man, he placed cultural change — mostly 
under such terms as traditions, customs 
and inherited habits — on an equal footing 
with organic evolution. Darwin's idea that 
cultural traits adapt, change and experi- 
ence selection as they are passed within and 
between generations attracted important 
followers among early behavioural scien- 
tists, but he had almost no influence on the 
anthropologists, sociologists and historians 
who dominated the study of human culture 
for most of the twentieth century. 


Yet the parallels between culture and 
biology had long been obvious — just look 
at the tree diagrams for language relation- 
ships and for related species. Evolutionary 
biologist Mark Pagel’s pioneering contribu- 
tion has been to show that this similarity is 
more than skin-deep, and that methods for 
revealing the evolutionary history of genes 
can illuminate the historical relationships 
between languages and other culturally 
transmitted behaviours. 

Wired for Culture provides a wide-ranging, 
non-technical survey of the field. Pagel’s two 
main themes are the role of cultural evolu- 
tion in patterns of cooperation and its part 
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out of Africa around 
60,000 years ago, in the 
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adaptive radiations as they found differing 
ways to exploit their local wild resources and 
to domesticate plants and animals. 

These changes required parallel inno- 
vations in social organization. In hunter- 
gatherer economies, between a few hundred 
and a thousand individuals cooperate to 
finesse risk and sustain a modest division of 
labour. Agricultural and industrial societies 
cooperate on far larger scales than hunter- 
gatherers to sustain an intricate system of 
exchanges between specialized workers with 
complementary skills. These workers are 
often unknown to each other, yet are utterly 
interdependent. 

Art and religion mobilize our emotions 
in support of collective projects. In Pagel’s 
terms, art and religion can act as enhancers 
that promote adaptive behaviour, but in other 
times and places, they can be selfish mind 
drugs — the cultural analogues of micro- 
bial pathogens. We use our extraordinary 
linguistic systems to operate these social 
structures by articulating morals through 
myths and stories, debating constitutions, 
negotiating deals, making requests, giving 
orders and making or breaking reputations 
through gossip. 

Pagel’s story of culture’s dominant role in 
human evolution is vivid and effective. Inevi- 
tably, he simplifies some concepts in order 
to craft an accessible, coherent narrative. For 
example, he portrays Neanderthals as the 
hapless victims of invading modern humans 
who used more sophisticated technology. In 
fact, much evidence suggests that modern 
humans usually made the same sorts of tools 
as Neanderthals, and the question of why this 
species disappeared is still quite open. 

Pagel has an interesting take on what is 
perhaps the deepest controversy among 
Darwinists: does culture fundamentally 
change the evolutionary dynamics of our 

species? As biologist 
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culture to be a proximate phenomenon — a 
mechanism for achieving something, rather 
than the ultimate, evolutionary reason for 
that achievement. 

The classic example of this distinction 
between ‘how’ and ‘why’ in biology is bird 
migration. The proximate causes of migra- 
tion are hormonal responses to changing 
day length that motivate and prepare birds 
for long-distance flight. The ultimate cause 
is selection for migratory behaviour, which 
in turn results in genetic change. Acommon 
inference is that causation in biology flows 
one way — from selection, to genetic change, 
to traits adapted to their environment. But 
human culture raises two issues regarding 
the arrow of causation. 

First, there are reasons to think that genetic 
change might be a response to cultural shifts 
as well as their cause. Pagel certainly thinks 
so. For example, he speculates that division 
of labour in humans has created selection for 
innate differences in personality and talent. 
Even the simplest human societies divide 
work between men and women and then 
pool the results of their specialized labours. 
In complex societies, myriad occupations 
trade their specialized products to assem- 
ble the goods that each person needs. Pagel 
also reviews evidence that many genes came 
under selection after we spread out of Africa 
and diversified our economic systems. 

Second, culture creates an arena for selec- 
tion and inheritance that is separate from 
genes. For example, as Pagel emphasizes, 
much cultural variation exists at the tribal, 
rather than individual, level. Some of us, 
including myself, think that selection of 
cultural differences between proto-tribes at 
the group level — with successful groups dis- 
placing their neighbours’ cultures through 
the spread of people or their ideas — might 
have led to selection of genes favouring 
docility, empathy and group loyalty. 

Pagel briefly discusses this concept, but 
favours the view that even suicidal self- 
sacrifice can evolve as a result of self-inter- 
ested benefits to individuals, rather than to 
their groups. He puts great weight on systems 
that use reputation to monitor and enforce 
good behaviour, arguing that the benefits of 
a good reputation, and the effects of a bad 
one, more than outweigh the personal costs 
of helping other members of your group. The 
resolution of this issue is perhaps the most 
important task in the study of human evo- 
lution, and it is a pity that Wired for Culture 
does not convey whatis at stake. But scholarly 
quibbles aside, this is the best popular science 
book on culture so far. m SEE COMMENTP.297 


Peter Richerson is a professor of ecology 
and human evolution at the University of 
California, Davis, USA. He is the co-author, 
with Robert Boyd, of Not By Genes Alone. 
e-mail: pjricherson@ucdavis.edu 


Books in brief 


Space Chronicles: Facing the Ultimate Frontier 

Neil deGrasse Tyson W. W. NORTON 368 pp. $26.95 (2012) 
Astrophysicist Neil deGrasse Tyson is on a space mission of his 
own. With NASA's research and exploration now diminished, Tyson 
— director of the American Museum of Natural History’s Hayden 
Planetarium — is keenly focused on what the United States will 
lose by failing to reinvent its space programme. In this collection, 
mined from 15 years of commentary and interviews edited by Avis 
Lang, he spotlights issues that underline the central importance of 
curiosity about the great beyond — from the nature of discovery to 
propulsion in deep space. 


The Evolved Apprentice: How Evolution Made Humans Unique 
Kim Sterelny MIT PRESS 264 pp. £24.95 (2012) 

We are inescapably different from the other great apes — sexually, 
morphologically and socially. In a book that forms part of the Jean 
Nicod Lecture Series, philosopher of biology Kim Sterelny tries 

to answer the vexed questions of why that is by arguing that our 
divergence from our closest cousins over the past 3 million years 
is down to a gradually enriched learning environment. Cooperative 
foraging, he posits, paved the way for positive-feedback loops that, 
incrementally and over vast reaches of time, led to adapted minds 
that were nurtured within adapted environments. 


The End of Money: Counterfeiters, Preachers, Techies, Dreamers 
— and the Coming Cashless Society 

David Wolman DA Capo PREss 240 pp. £16.99 (2012) 

‘Filthy lucre’, David Wolman shows us, is a particularly apt phrase. 
Minting technologies gobble huge amounts of metals, cotton and 
water; the transport and production of cash has a giant carbon 
footprint; and Staphylococcus bacteria have been detected on 94% 
of US one-dollar bills. In this fascinating book, Wired contributing 
editor Wolman argues that its end is nigh. He spent a cash-free year 
researching everything from counterfeiting to tax-dodging and failed 
currencies, and looks at digital solutions such as smart banknotes. 


: From Melancholia to Prozac: A History of Depression 
Clark Lawlor OXFORD UNIVERSITY PRESS 288 pp. £14.99 (2012) 
Seventeenth-century scholar Robert Burton may have anatomized 
melancholy, but the definitions, diagnoses and treatments of and for 
depression are still hotly debated by the pharmaceutical industry, 
psychiatry, psychology and affected citizens. Writer Clark Lawlor 
trawls history from the classical era onwards, shining some light on 
this psychological dark horse. Along the way, he touches on radical 
differences in cultural definitions, explores tensions between the 
biomedical model and humanistic concepts, and weighs up ‘cures’ 


from talking therapy to drugs. 

The Song of the Ape: Understanding the Languages of Chimpanzees 
Andrew R. Halloran ST MARTIN’S PRESS 288 pp. $25.99 (2012) 

A chance observation drove primatologist Andrew Halloran to 

study how chimpanzees communicate. While keeper to a group of 
island-bound chimps, he saw five ousted members calmly using a 
rowing boat to escape, as if they had discussed a plan. Examining the 
histories of these five, he picked out and recorded dozens of phrases 
in their vocalizations. Weaving in the stories of controversial attempts 
to teach sign language to primates such as Nim Chimpsky, Halloran 
concludes that chimps have their own highly complex lexicon. 
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Acrude awakening 


John Vidal is gripped by a book that reveals how natural 
riches can impoverish nations. 


hen oil was first extracted and sent 
from Lago Agrio, Ecuador, in July 
1972, the military dictatorship 


held a ceremony signalling a new era. People 
dipped hands in the yellow-brown crude, 
children were baptized with it, and the first 
barrel was placed in a museum in Quito. 

Forty years later, Ecuador has extracted 
nearly half of its known reserves, earning 
it US$130 billion. That has paid for needed 
infrastructure, but has also led to widespread 
corruption, impoverishment, inequality, 
insurgency and environmental devastation. 
Crude oil has transformed Ecuador — just 
not in the way Ecuadoreans expected. The 
situation could be shifting there, but is a 
familiar one in the developing world. Politi- 
cal scientist Michael Ross devotes The Oil 
Curse to unpicking it. 

This ‘paradox of plenty’ afflicts as many 
as 40 other developing countries, among 
them Nigeria, Cameroon and Equatorial 
Guinea. So far, only a handful of countries 
have avoided it, including Norway, Britain 
and a few smaller Arab states. Analysts have 
struggled to explain the paradox ever since 
British economist Richard Auty recognized 
it in 1993, invoking reasons ranging from for- 
eign oil companies creaming off big profits to 
a simple lack of readiness for sudden riches. 

Ross largely dismisses such triggers, sug- 
gesting that the fault lies mostly in the nature 
of oil wealth itself. Modern oil revenues, he 
proposes, have a more powerful and harmful 
effect on poor countries than money from 
other minerals because the sums involved 
are huge (now consistently more than $100 
a barrel), do not come from taxing citizens 
and are easy to conceal from public scrutiny. 

Poor governance, Ross says, does play a 
part: oil-funded rulers can use ‘petro-dollars’ 
to block democratic reforms, an argument 
backed by stories from the pro-democracy 
uprisings of the Arab Spring. Protesters in 
oil-poorer countries such as Tunisia and 
Egypt found it easier to overthrow their rul- 
ers than did those in oil-rich states like Saudi 
Arabia, Libya and Algeria, he notes. 

Ross is less convincing in tracing the start 
of the ‘oil curse’ back to the early 1970s, when 
prices quadrupled in a few months and many 
governments seized control of their coun- 
tries’ oil industries. Before nationalization, 
he argues, oil-rich developing countries were 
not that different from others. His research 


shows how such 
countries are today 
more likely to be ruled 
by autocrats and to 
descend into civil war 
than countries with no 
oil reserves. Oil wealth 
also creates less eco- 
nomic growth than it 


Lasik eee should, and produces 
Wealth Shapesthe More work for men 
Development of than women. 

Nations These are all good 


MICHAEL L. ROSS 
Princeton University 
Press: 2012. 296 pp. 
$29.95, £19.95 


points, but Ross’s 
exoneration of corp- 
orate and colonial 
powers before the 
1970s weakens his argument. To many 
developing countries, an oil curse is just an 
escalation of colonial pillage. Oil, along with 
land acquisitions, is simply the latest resource 
to be taken by rapacious companies and 
national elites, leaving the majority of citizens 
as bystanders in the development process. 
Ross bravely suggests remedies, but I fear 
that most will be dismissed as impractical or 
naively neo-liberal. Because he attributes the 
malaise partly to state ownership of oil assets, 
he advocates some level of privatization of 
oil industries. He skips over President Hugo 
Chavez's 2007 nationalization of Venezuela's 
oil reserves to pay for social reforms, and 
fails to ask why any poor country would be 
allowed by its people to sell offits major asset. 
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Revenues from oil fields in developing nations such as Nigeria are rarely passed directly on to citizens. 


In searching for solutions, Ross looks at 
reducing the size of petroleum revenues and 
imposing sanctions against “undemocratic” 
governments — but without taking into 
account the anger and global price rise this 
could provoke. He also urges governments 
and corporations worldwide to be more 
transparent, although critics might doubt 
that all would take him up on it. 

Ross wisely advises countries to distribute 
oil wealth directly to citizens. And he is useful 
on ‘barter’ contracts between nations, which 
avoid corruption by exchanging oil for goods 
or resources rather than money. But, sadly, he 
touches only briefly on the idea of leaving the 
oil in the ground, and doesn't mention Ecua- 
dor’s radical experiment along those lines, 
initiated by President Rafael Correa in 2007. 

Correa has asked the world to pay Ecua- 
dor not to extract oil reserves worth around 
$7.2 billion from a 1,200-square-kilometre 
block of the Yasuni national park. Some 
20% of Ecuador's remaining reserves are to 
be left untapped in return for around half of 
the revenue they would have been worth if 
exploited. In late 2011, with the full back- 
ing of the United Nations Development 
Programme and the polled approval of 
the Ecuadorean people, Correa pledged to 
“leave the oil in the soil”. The first $100-mil- 
lion tranche, to be used for conservation and 
renewable-energy projects, has been lodged 
with the UN. 

Economists have mostly shied away from 
full costings of the ecological and social 
devastation of oil use. Were they to do so 
with the thoroughness and authority dis- 
played by Ross in The Oil Curse, they might 
start to develop the new economic model 
for oil and other extractive industries that 
is so desperately needed. m 


John Vidal is environment editor of 
The Guardian newspaper in London. 
e-mail: john. vidal@guardian.co.uk 


G. OSODI/PANOS 


CENTER FOR POSTNATURAL HISTORY 


NICK HIGGINS 


A dermestid beetle that dined on transgenic mosquitoes earned a place in Richard Pell’s museum. 


Q&A Richard Pell 


Transgene curator 


Next month in Pittsburgh, Pennsylvania, artist Richard Pell opens the Center for PostNatural 
History — a museum of bioengineered organisms. He talks about the joys and pitfalls involved 
in collecting genetically modified maize, mosquitoes and zebrafish. 


Why did you start the 
museum? 

As an artist, I made 
robots in an attempt 
to start an ethical 
conversation in the 
engineering commu- 
nity about funding 
and other political 
issues. Then I was Ry a he 
introduced to syn- 

thetic biology by one of the field’s pioneers, 
Chris Voigt, who is now at the Massachu- 
setts Institute of Technology in Cambridge. 
I began to wonder why transgenic organisms 
weren't shown on the evolutionary tree. So I 
began collecting specimens of living things 
that had been intentionally and heritably 
altered by humans. 


What is the museum’s focus? 

The museum is essentially anthropocentric 
— it looks at the organisms that we alter, but 
also at how they alter us. The history within 
an engineered organism is vast, and repre- 
sents the continuum of human manipulation 
of plants and animals. For example, the rats 
we breed to develop human-like tumours 
will shape the progress of medical research, 
which in turn will have an effect on which 
of us survive. 


What specimens have you collected? 
Examples include GloFish, which are 
zebrafish that contain genes from bio- 
luminescent jellyfish and coral — the only 
transgenic organism you can buy in a pet 
shop. We couldn't acquire genetically modi- 
fied maize [corn] directly without entering 
into an elaborate licensing agreement with 
its developers, Monsanto. But by planting 
maize kernels from shop-bought animal 
food and testing to see whether the plants 
survived the pesticide Roundup, we were 
able to add Roundup-Ready maize to our 
collection. We're not trying to be provoca- 
tive, we're just documenting thoroughly. 


What specimens don’t you have? 

The ‘biosteel goat’ designed by Canadian 
company Nexia Biotechnologies. It pro- 
duces milk containing spider silk that 
could be used to make stronger bulletproof 
vests — one of the first ‘biofactories. The 
US Defense Advanced Research Projects 
Agency [DARPA] moved half of the herd to 
a decommissioned airforce base in upstate 
New York. The other half went to an ongoing 
research project at the University of Wyo- 
ming in Laramie. The status of the DARPA 
goats remains unknown. Our exhibit con- 
sists of a diorama of the military goat farm, 
based on images from Google Earth. 
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Does the collection include any insects? 
Thad a nice stock of mosquitoes, which had 
been altered so that they could not harbour 
the parasite that causes malaria. This is one 
of a handful of organisms engineered to 
be released into the wild. Last year, when I 
opened up my mosquito box, it was empty 
apart from a dermestid beetle larva, a com- 
mon pest in museums. Despite my attempt 
to make a habitat, it died, but because its diet 
consisted exclusively of genetically engi- 
neered mosquitoes, it earned a place in the 
collection. 


Why do you have an exhibit on ‘genetic copy 
prevention’? 

Companies wanting to sell living things 
perceive a fundamental problem: their prod- 
ucts reproduce for free. One solution is the 
terminator gene patented by Monsanto, an 
on-off switch that allows the organism to 
reproduce in the lab but that makes it go ster- 
ile in the wild. There are other approaches 
to limiting reproduction: in the late 1950s, 
the United States built factories to irradiate 
millions of screw worms, which feed on the 
living flesh of livestock. Sterilized male screw 
worms were then dropped from aeroplanes 
so that wild females would mate with them. 
This eradicated the insect from US livestock. 


Does the museum cover genetically altered 
humans? 

We don’t have exhibits on that topic yet, 
but we do archive this type of research. For 
example, in 2007, researchers at Cornell 
University in New York produced a trans- 
genic human embryo that expressed a red 
fluorescent protein from coral, and allowed 
it to grow for five days before terminating 
it. And in gene therapy, a hacked retrovirus 
inserts foreign DNA into a patient’s genome 
to produce a certain protein. That change 
is not supposed to be heritable, but there is 
the possibility that the virus could make its 
way into a germ cell. Although both of these 
examples are minor changes in compari- 
son with how humans have altered other 
organisms, I think that this will be an area 
of interest for the museum in the future. 


How does the museum deal with people’s 
biased views? 

The rhetoric around altered organisms 
has become narrow, both for those who 
are afraid of ‘frankenfoods’ and those who 
believe that genetic engineering will cure 
cancer. People often want to have their own 
belief system mirrored in your rhetoric, or 
at least they want someone else’s bias so that 
they can recognize and argue with it. Other- 
wise they must argue with themselves, which 
is uncomfortable but exactly the experience 
we want them to have. m 


INTERVIEW BY JASCHA HOFFMAN 
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Correspondence 


No catch to UK 
charity funding 


Universities receiving upwards 
of £1 billion (US$1.6 billion) in 
total annually from UK medical 
research charities would probably 
disagree that their benefactors are 
‘hijacking’ university resources 
(Nature 481, 260; 2012). 

The UK government supports 
charity-funded research as part 
of its higher-education funding. 
This enables charitable funds, 
often donated by the public, to be 
spent directly on research while 
the government pays universities 
to cover the costs of overheads 
and infrastructure. 

Charities themselves invest 
in research infrastructure 
and resources. For example, 
the Wellcome Trust and 
Cancer Research UK are 
collaborating with the Medical 
Research Council and three 
London universities to invest 
£650 million in the Francis Crick 
Institute, a world-class research 
centre. The British Heart 
Foundation has committed 
£10 million to new university 
buildings since 2010, including 
£1 million towards the Centre 
for Regenerative Medicine at the 
University of Edinburgh. 

The UK research base benefits 
from the breadth and diversity 
provided by a mix of public, for- 
profit and charitable funders. 
Mark Walport Wellcome Trust, 
London, UK. 
m.walport@wellcome.ac.uk 
Iain Foulkes Cancer Research 
UK, London, UK. 

Peter Weissberg British Heart 
Foundation, London, UK. 
Delyth Morgan Breast Cancer 
Campaign, London, UK. 
Sharmila Nebhrajani 
Association of Medical Research 
Charities, London, UK. 


Whaling: don’t trade 
the moratorium away 


In their proposal to allocate 
‘whale shares’ to both whalers 
and conservationists as an 
alternative to the International 


Whaling Commission (IWC) 
moratorium on commercial 
whaling, Christopher Costello and 
colleagues overlook several factors 
(Nature 481, 139-140; 2012). 
Commercial whaling is in 
decline. In Japan, it is becoming 
less economically viable as 
consumer demand and whale- 
meat sales revenues fall — even 
with an increasing government 
subsidy, which this year is roughly 
¥2.3 billion (US$30 million). 
Demand is also waning in 
Iceland and Norway. A ban on 
international trade prevents these 
countries from securing new 
markets. Last year, the global 
stockpile of unwanted whale meat 
reached more than 7,000 tonnes. 
The effective management of 
commercial whaling would cost 
a lot more than its protagonists 
can afford and than non-whaling 
nations are willing to pay. 
Costello et al. also overlook 
the high costs of the independent 
surveys and analysis that would 
be needed to generate safe 
quotas for whaling, as well as the 
international compliance scheme 
required to enforce regulations. 
The IWC’s founding treaty 
does not allow for quotas to be 
allocated to individual countries. 
Its renegotiation to facilitate a 
scheme such as Costello and 
colleagues describe would require 
unanimity, which is currently 
unthinkable. Given all this, it 
would be foolhardy to trade away 
the moratorium. 
Mark Peter Simmonds Whale 
and Dolphin Conservation 
Society, Chippenham, UK. 
mark, simmonds@wdcs.org 
Sue Fisher Animal Welfare 
Institute, Washington DC, USA. 


Whaling: ways to 
agree on quotas 


The sticking point in discussions 
of whale conservation schemes 
(Nature 481, 139-140; 2012) 

has been reaching agreement 

on the total catch that each 
whale population can sustain. 
The International Whaling 
Commission's scientific 


308 | NATURE | VOL 482 | 16 FEBRUARY 2012 
© 2012 Macmillan Publishers Limited. All rights reserved 


committee has developed 

and simulation-tested an 
adaptive algorithm, the Revised 
Management Procedure (RMP), 
to determine safe catch limits. For 
most populations, RMP limits 

are below the numbers discussed 
during the commission's failed 
2010 negotiation of catch levels. 

The whale-related expenditure 
of most of the organizations 
mentioned by Costello and 
colleagues goes largely to 
scientific research and outreach, 
not to protests. These funds have 
helped to develop tools such as 
the RMP, genetic techniques to 
monitor markets and improved. 
methods for estimating whale 
numbers and demography. 

If conservation organizations 
were to ‘buy’ whales, it would not 
necessarily reduce the numbers 
killed. Most catch quotas set by 
governments lie well above the 
numbers actually taken. Even if 
the more conservative RMP catch 
limits were applied to a new whale 
market, ‘buying’ a given number 
would not save that number over 
time. Under an adaptive-feedback 
management procedure such as 
the RMP, killing fewer whales one 
year tends to increase catch limits 
in subsequent years. 

Justin G. Cooke Centre for 
Ecosystem Management Studies, 
Emmendingen, Germany. 
jgc@cems.de 

Russell Leaper University of 
Aberdeen, UK. 

Vassili Papastavrou University 
of Bristol, UK. 


Big data deserve a 
bigger audience 


The huge repositories of data 
collected by services such as 
Twitter, Facebook and Google 
can cause serious problems 
beyond quality control (Nature 
481, 25; 2012). 

Many of the emerging ‘big 
data’ come from private sources 
that are inaccessible to other 
researchers. The data source 
may be hidden, compounding 
problems of verification, as 
well as concerns about the 


generality of the results. 

These results are meaningful 
only if many other data sets 
reveal the same behaviour. This 
uncovers a deeper problem: if 
an independent set of data fails 
to validate results derived from 
privately owned data, how do we 
know whether it is because those 
data are not universal or because 
the authors made a mistake? 

If this trend continues, we 
could see a small group of 
scientists with access to private 
data repositories enjoying an 
unfair amount of attention at 
the expense of equally talented 
researchers without these 
‘connections. 

Bernardo A. Huberman 

HP Labs, Palo Alto, California, 
USA. 
bernardo.huberman@hp.com 


Data audits could 
curb misconduct 


Universities and government 
research institutes could perhaps 
learn from the private sector 
when it comes to curbing 
research misconduct (Nature 
481, 237-238; 2012). 

Research entities should 
undergo independent audits 
of scientific data annually by 
certified public scientists, in 
much the same way as businesses 
and not-for-profit organizations 
are independently audited by 
certified financial accountants 
(J. L. Glick Ann. NY Acad. Sci. 
265, 178-192; 1976). Data audits 
are common for corporate 
biotechnology laboratories, but 
not for academic ones. 

I estimated that the costs 
of funding questionable 
research practices (such as data 
misrepresentation and fabrication 
of results) could be reduced by 
US$5-10 for each dollar spent 
on data audits (Account. Res. 2, 
153-168; 1992). Besides being a 
cost-effective way of monitoring 
the integrity of research 
organizations, data auditing helps 
to reveal genuine errors. 
J. Leslie Glick Tampa, Florida, 
USA. jlglick@ix.netcom.com 
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RNA discrimination 


In the cell, genomic DNA is transcribed into various types of RNA. But not all RNAs are translated into proteins. Does this 
give protein-coding RNAs greater credibility in terms of function? Views differ. 


THE TOPIC IN BRIEF 

@ Among RNA populations, messenger RNA 
arguably holds special status. 

@ Its mere transcription from DNA is 
considered sufficient evidence for its function 
as a protein-coding sequence. 

@ Whether tens of thousands of 


Quantity or 
quality? 
MONIKA S. KOWALCZYK & DOUGLAS R. HIGGS 


n many species, ncRNAs are abundant and 

bewilderingly complex. What we wonder 
is whether they all carry genetic informa- 
tion (as do all mRNAs), or whether some 
of them are the by-products of abnormal or 
inconsequential transcription. 

Many short ncRNAs, which are often derived 
from long ncRNAs (IncRNAs), regulate gene 
expression (Table 1). Moreover, full-length 
IncRNAs may themselves have biological 
roles. Take, for example, the RNA product of 
the XIST gene, an IncRNA that effects inacti- 
vation of the X chromosome. Many ncRNAs 
are transcribed from intergenic regions around 
genes and their regulatory elements (for exam- 
ple, enhancers and promoters). Some overlap 
with protein-coding genes in both sense — the 
direction of transcription — and antisense ori- 
entations. Others lie in intergenic regions far 
from protein-coding genes. 

In contrast to genes, however, genomic 
sequences encoding ncRNAs have often been 
poorly conserved during evolution. Also, very 
few natural mutations in ncRNAs have been 
shown to be the main cause of genetic diseases 
in humans, and few functionally important 
mutations in ncRNA-encoding genes have been 
identified in animal models’. This suggests that, 
in contrast to many protein-coding genes, indi- 
vidual ncRNAs have a relatively minor role in 
biological processes. 

If some ncRNAs are non-functional, why 
are they transcribed? Here, it may be that the 
level of expression is the significant factor. The 


non-protein-coding RNAs (ncRNAs) are 
equally important is debatable. 

@ One argument is that unless a function is 
discovered for a ncRNA, transcription per se 
is not enough to suggest that it has a function. 
@ The alternative viewpoint is that if ncRNAs 
are transcribed, it must be for a reason. 


ever-improving technologies for sequencing 
the transcriptome’ (the cell’s complement of 
total RNA) can now detect RNAs present at an 
average of less than one copy per cell. At what 
stage does such information pass from reveal- 
ing additional genetic complexity to simply 
detecting the inevitable by-products of tran- 
scription from accessible, activated chromatin 
(DNA-protein complexes)? 

Indeed, the efficiency with which all stages 
of transcription and RNA processing are 
performed is intimately related to the physi- 
cal and chemical state of the associated chro- 

matin. The cell has 


“The onus is evolved complex 
onscientiststo —_‘SYStemS to suppress 
unequivocally promiscuous tran- 
d scription from chro- 

vldpincidhvaid matin both within 
sirfuprs genes and from inter- 


genic regions. Fur- 
thermore, specific 
pathways degrade 
aberrant or irrelevant transcripts*. When these 
constraints are removed experimentally (and 
presumably when naturally modified in vivo), 
some irrelevant RNA transcripts are likely to be 
produced from many promoter- and enhancer- 
like elements that are accessible in chromatin. 
The genome contains many more enhancers 
than it does protein-coding genes, and these 
determine when and where genes are expressed. 
Enhancers are widely dispersed throughout 
the genome. Those that occur between gene 
sequences produce a variety of RNAs, includ- 
ing IncRNAs, but very few of these transcripts 
have been shown unequivocally to have a func- 
tion (Table 1). Enhancers located within genes 
also produce long transcripts known as multi- 
exonic enhancer RNAs (meRNAs; Table 1), 
which resemble mRNA. Nonetheless, these 


molecules.” 
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transcripts have a very low coding capacity’, 
and their role — ifany — is unknown. 

Although many RNAs emanating from 
enhancers, promoters and other genomic 
elements that regulate gene expression may 
represent inconsequential transcription, trans- 
cription per se may be required for establishing 
or maintaining the activity of these elements 
and for ‘templating’ the associated chroma- 
tin. If so, information carried in the sequences 
of these ncRNAs would be largely irrelevant. 

The huge number and complexity of RNAs 
being documented is certainly of great interest, 
and it would be surprising if evolution had not 
selected a proportion of these for their biologi- 
cal function’. However, the onus is on scientists 
to unequivocally demonstrate the biological 
roles of these molecules, rather than presum- 
ing that they are all functionally relevant com- 
ponents of the transcriptome. For starters, 
ncRNAs should be accurately classified by fully 
defining their associated transcriptional units 
and their patterns of expression during devel- 
opment and cell differentiation, as recently set 
out’. This, in turn, should direct the challeng- 
ing experiments required to determine how 
various ncRNAs act, individually or in groups, 
to exert their proposed biological effects. 


Monika S. Kowalczyk and Douglas R. Higgs 
are at the MRC Molecular Haematology Unit, 
Weatherall Institute of Molecular Medicine, 
University of Oxford, Oxford OX3 9DS, UK. 
e-mail: doug.higgs@imm.ox.ac.uk 


Patience isa 
virtue 
THOMAS R. GINGERAS 


A open mind is not an uncritical one, 
and the obligation to be critical as sci- 
entists should not necessarily condition us to 
look unfavourably on unexpected results. We 
should therefore not be too sceptical about 
ncRNAs just because we don't know their func- 
tions. But let me start with an outline of where 
the debate originates. 
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TABLE 1| MAIN CLASSES AND FUNCTIONS OF MAMMALIAN NON-CODING RNAS 


ncRNA* No. of known | Transcript lengths Functions 
transcripts’ | (nucleotides; nt)" 
Precursors to short RNAs 
miRNA 1,756 >1,000 Precursors to short (21-23 nt) regulatory RNAs 
snoRNA 1,521 >100 Precursors to short (60-300 nt) RNAs that help to chemically modify other RNAs 
snRNA 1,944 1,000 Precursors to short (150 nt) RNAs that assist in RNA splicing 
piRNA 89 Unknown Precursors to short (25-33 nt) RNAs that repress retrotransposition of repeat elements 
tRNA 497 >100 Precursors to short (73-93 nt) transfer RNAs 
Long ncRNAs 
Antisense ncRNA 5,446 100->1,000 Mostly unknown, but some are involved in gene regulation through RNA interference 
Enhancer ncRNA (eRNA)$ >2,000 >1,000 Unknown 
Enhancer ncRNA (meRNA)!" | Not fully As variable as the length | Unknown, but they resemble alternative gene transcripts 
documented | of mRNAs 
Intergenic ncRNA 6,742 107-10° Mostly unknown, but some are involved in gene regulation 
Pseudogene ncRNA 680 107-10* Mostly unknown, but some are involved in regulation of miRNA 
3' UTR ncRNA 12 >100 Unknown 


*miRNA, microRNA; snoRNA, small nucleolar RNA; snRNA, small nuclear RNA; piRNA, piwi-interacting RNA; tRNA, transfer RNA; antisense ncRNA, transcripts mapping and overlapping 
coding and non-coding RNAs; enhancer ncRNA (eRNAs and meRNAs), transcripts that initiate within regions that regulate specific genes; intergenic ncRNA, transcripts that map to genome 
regions between annotated genes; pseudogene ncRNA, transcripts that come from processed or unprocessed pseudogenes; 3' UTR ncRNA, 3'-untranslated regions of ncRNA. 


‘From ref. 13. 
#Summarized from a range of lengths. 


SFrom ref. 16. Transcript number listed comes from the analysis of one cell line (mouse neuronal cells) and is a significant underestimate. 


"From ref. 4. Analysis was done in mouse erythroid cells. 


Within the past decade, reports that the 
mouse and human genomes were pervasively 
transcribed (meaning “that the majority of its 
bases are associated with at least one primary 
transcript”’) into predominantly ncRNAs*” 
were surprising and resulted in two types of 
criticism. The first of these centred on whether 
the detected RNAs are artefacts of the tech- 
nologies used to identify them. The second 
objection focused on the biological impor- 
tance of such transcripts. Unfortunately, these 
criticisms were often intermingled”, requiring 
subsequent correction”. 

The overwhelming majority of novel 
ncRNAs have three properties that suggest 
they should be ignored. The transcripts seem 
to have greatly reduced protein-coding poten- 
tial; their expression levels are markedly lower 
than those of mRNAs; and their expression is 
mostly cell-type specific. Moreover, genomic 
sequences encoding these transcripts map 
to regions that were previously thought to be 
either untranscribed (sequences in the oppo- 
site strands to genes, and sequences between 
genes) or uninformative (intron sequences 
within genes, which do not make it into 
mature mRNA). 

The artefact objection has now been 
addressed by results from many labs showing 
a wealth of ncRNA expression using a wide 
range of technologies (tiling arrays, high- 
throughput RNA sequencing, full-length 
complementary DNA cloning, northern 
hybridization and RNase protection). Deter- 
mining the biological importance of ncRNAs 
is more challenging and an area of active 
investigation. Nonetheless, as the efforts to 
catalogue and characterize such RNAs get 


under way, the initial atmosphere of scepticism 
continues to hang over this subject. 

Healthy scepticism is an essential element 
of the scientific process. But it seems curious 
that ncRNAs have been deemed less interest- 
ing than mRNAs simply as a result of the short 
time since their discovery and the poor under- 
standing of their biological roles. It is perhaps 
worth recalling that it took almost eight years 
from the discovery of the first member of 
microRNAs (lin-4 
miRNA) to the elu- 


(73 

We should not cidation of the func- 
be too stoke i tical tion of the very large 
about a class of short ncRNAs 
coding RNAs to which it belongs”. 
just because we The functions attrib- 
don’tknowtheir uted to miRNAs 
functions.” include such fun- 


damental biological 

processes as control of developmental timing 
(miR273), organ development (miR84), tissue 
growth (miR181) and tumorigenesis (miR17). 
According to the careful annotation by the 
GENCODE group”, there are currently some 
161,000 human transcripts, 85,323 (53%) of 
which are ncRNAs”™. Although the biological 
function of most of these ncRNAs is unclear, 
roughly 2% are precursors to miRNAs. 
In addition, 10% are IncRNAs that map to 
intergenic and intronic regions, and many 
of these transcripts have been implicated in 
regulation — both locally and from a distance 
— of developmentally important genes’. Nota- 
bly, another 16% of the annotated ncRNAs 
map to pseudogenes — genes that have lost 
their original functional abilities. And some 
of these have been shown to regulate gene 


expression by acting as decoys for miRNAs”. 

With the growing identification of func- 
tional classes of ncRNA and understanding 
of the various roles that many of these 
transcripts have, the original atmosphere 
of pessimism concerning their biological 
importance should gradually change to one 
of cautious interest. The scientific process is 
not free of bias. But openness to fresh pos- 
sibilities has the potential to reveal many 
new ideas. m SEE INSIGHT P.321 
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ANALYTICAL CHEMISTRY 


Ultrasensitive 
radiocarbon detection 


Radiocarbon is rare, forming no more than one part per trillion of the total carbon 
content of the atmosphere. An optical method allows radiocarbon to be detected 
at roughly 25-fold lower levels than this, opening up fresh avenues of research. 


RICHARD N. ZARE 


adiocarbon dating is an invaluable 
R technique for determining the age of 

carbon-containing samples up to about 
50,000 years old. Until now, the only method 
available for measuring levels of radiocarbon 
(carbon-14) in a sample has been high-energy 
accelerator mass spectrometry, but the appa- 
ratus involved is bulky, expensive and com- 
plex. Reporting in Physical Review Letters, 
Galli et al.’ describe an optical technique 
for measuring radiocarbon concentration 
that might overcome these problems. Their 
approach promises to greatly extend the use 
of radiocarbon measurements for dating, and 
as a tracer technique for following the fate of 
organic compounds in the body. It forms part 
of a growing revolution that is replacing mass 
spectrometry with optical methods for isotope 
analysis“. 

On Earth, there are three naturally occur- 
ring isotopes of carbon. The most abundant 
of these (99%) is carbon-12, with most of 
the rest being carbon-13. But carbon-14 also 
occurs in trace amounts, forming only as 
much as 1 part per trillion (0.0000000001%) 
of the carbon in Earth’s atmosphere. Unlike '*C 
and “°C, “Cis radioactive, decaying with a half- 
life of about 5,730 years — a fact that makes it 
potentially useful as a radiolabel for several 
applications. 

Carbon-14 is mainly produced in Earth's 
atmosphere from the bombardment of nitro- 
gen molecules by cosmic rays. Plants fix atmos- 
pheric carbon dioxide during photosynthesis, 
and so the level of “C in plants (and in animals 
that eat plants) when they die approximately 
equals the level of the isotope in the atmos- 
phere at that time. Because the amount of “C 
in dead organisms subsequently decreases as 
a result of radioactive decay, the date of car- 
bon fixation (or death) can be determined by 
measuring the amount of the isotope in the 
remains. This is the basis of radiocarbon dat- 
ing, the technique that has been a workhorse 
for estimating the age of organic remains from 
archaeological sites. In practice, the ratio of the 
number of “C atoms to the total number of 
other carbon atoms in a sample is measured. 

Because of the paucity of “C, radiocarbon 
dating presents a huge technical challenge. 


Most isotope-ratio measurements are carried 
out using mass spectrometry, in which ions 
are weighed by measuring their trajectories in 
electric and/or magnetic fields in a vacuum. 
Standard mass spectrometers, however, do not 
have sufficient resolution to distinguish the 
small mass difference between '*C and “‘N, the 
most common isotope of nitrogen. The abun- 
dant presence of N therefore tends to mask 
any signal from radiocarbon. 

High-energy accelerator mass spectrometry 
can overcome this problem. For this technique, 
the sample must first be turned into solid 
carbon (graphite) using a series of chemical 
transformations. It is then bombarded with 
caesium ions to produce negatively charged 
carbon ions, which are accelerated by a posi- 
tive voltage of millions of volts. The negative 
ions, by now travelling at a few per cent of the 
speed of light, are subsequently converted 
into positive ions by an electron stripper 
(which consists of a gas or a thin foil), before 
the ions’ masses are determined. Because 
nitrogen atoms do not form stable negative 
ions, the resulting data are free from nitrogen 
interference. But although accelerator mass 
spectrometers are powerful tools, they are 
also costly. Establishing and maintaining such 
an instrument costs millions of dollars, and so 
they tend to be found only at national facilities. 

A much simpler approach is to completely 
oxidize a sample so that every carbon atom is 
turned into carbon dioxide’. The various iso- 
topic forms of carbon dioxide can then be dis- 
tinguished from each other because each has 
a slightly different infrared spectrum (they 
absorb slightly different frequencies of infra- 
red light). All that is required to determine the 
ratios of carbon isotopes in a sample of carbon 
dioxide is to precisely measure the intensities 
of the spectral lines that correspond to infra- 
red absorption for each isotopic form of the 
gas. Other compounds, such as water vapour 
and nitrogen, do not interfere in the infrared 
spectrum, either because they have different 
infrared spectra or because they do not have 
‘allowed’ infrared transitions — that is, quan- 
tum mechanics prevents the molecules from 
undergoing energy transitions that would be 
detected in the infrared. 

This optical approach has already been 
used’ to determine the carbon-isotope ratio 
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of carbon dioxide containing '*C and °C, 
but the low concentration of “C has made its 
measurement in carbon dioxide extremely 
difficult. Using an ultrasensitive technique 
called saturated-absorption cavity ring-down 
spectroscopy’, Galli et al.' have now succeeded 
in measuring the ratio of '*C to total carbon 
at values well below radiocarbon’s natural 
abundance in carbon dioxide. 

In their technique, the authors placed a gas 
sample between two or more highly reflecting 
mirrors that form an optical cavity. Infrared 
light that is incident on the cavity continually 
circulates within it, so that it takes many round 
trips. This effectively increases the optical path 
length of the light, allowing infrared absorp- 
tion by the gas to be detected with a sensitiv- 
ity that vastly exceeds what can be achieved in 
traditional absorption experiments. 

Another feature of the cavity is that, when 
the infrared light source is interrupted, the 
radiant energy stored in the cavity ‘rings dow’ 
— it decreases over time. Using a powerful 
infrared laser to ‘saturate’ the vibrational- 
rotational transitions in carbon dioxide that 
correspond to infrared absorption, Galli 
et al. used the rate of ring down as an excel- 
lent absolute measure of the concentration 
of absorbers inside the cavity (an approach 
that has previously been reported for infrared 
spectroscopy’). The authors obtained a linear 
concentration response down to a detection 
limit of about 43 parts per quadrillion, which 
makes their technique quite well suited for 
radiodating carbonaceous samples. It may also 
have applications in positron emission tomog- 
raphy (an imaging technique used in medicine 
for body scans), which often requires® monitor- 
ing of carbon dioxide labelled with carbon-11, 
an artificial radioactive isotope of carbon. 

Galli and colleagues say that the size of 
their experimental set-up is roughly two 
square metres in area, about 100 times smaller 
than the footprint of typical accelerator mass 
spectrometers. Furthermore, the equipment 
costs only about US$400,000 — many times 
less than an accelerator mass spectrometer. 
For widespread adoption of the infrared 
technology, however, it will be necessary to 
reduce the cost even further, say by a factor 
of five or ten. Even so, an infrared method 
for measuring isotope ratios represents a real 
breakthrough because of the many possible 
uses of the technique. And there are other 
advantages. For example, in mass spectrom- 
etry, an ion from a sample is counted only 
once because its measurement neutralizes it. 
But infrared-absorption measurements do not 
destroy the sample, allowing it to be repeatedly 
analysed. 

With further improvements, the infrared 
technique may well become the method of 
choice for measuring the isotope ratios of 
many common elements. Moreover, if the 
anticipated cost reductions are realized, the 
measurement of isotope ratios might become 


a widely used tool in determining the origins 
of materials used for a broad range of purposes, 
from environmental monitoring to medical 
research. m 
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Species choked 


and blended 


The appearance of new ecological niches propels the evolution of species, but the 
converse can also occur. A study shows that changing lake habitats have caused 
extinctions and reduced the genetic differences between species. SEE ARTICLE P.357 
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feed in open water and spawn much deeper. 

However, increased human activity around 
the lakes dramatically altered the lakes’ ecol- 
ogy during the twentieth century. Higher 
nutrient levels in the water caused eutrophi- 
cation, in which algal populations increase, 
water quality is reduced and oxygen levels 
at the lake bottom decrease. Vonlanthen 
et al.' propose that these conditions com- 
pressed the depth range in which whitefish 
could spawn, bringing previously separated 
species together to breed, forming hybrids. 
Whitefish feeding patterns were probably also 
affected, through reductions in zooplankton 
diversity and possibly in the density of bottom- 
dwelling prey (Fig. 1), which would also have 
reduced opportunities for exploiting ecological 
variation. 

Vonlanthen and colleagues’ data show 
that the extent of species loss for each lake 
correlates with the severity of that lake’s 
eutrophication. But did these extinctions 
result exclusively from demographic decline 


JEFFREY S. MCKINNON & ERIC B. TAYLOR 


onventional wisdom long held that 
( even if individuals of two different 

species could mate with each other, 
their offspring were doomed to early death 
or sterility. But a different view is taking hold: 
that it is often adaptations to different envi- 
ronments that cause species to separate, such 
that hybrid offspring fail because of their poor 
fit to resources, rather than through intrinsic 
shortcomings’. As a consequence, changes to 
particular environmental conditions that pre- 
viously kept species distinct could increase 
genetic mixing, and thereby reduce species 
number. On page 357 of this issue, Vonlanthen 
et al.’ provide evidence that human alterations 


to lake habitats have eroded barriers between 
species and contributed to extinctions. 

The authors’ study of 17 Swiss lakes shows 
that glacial melting in the past 12,000 years 
provided ecological opportunities, in the 
form of new environmental niches, that led to 
diversification of whitefish species, as has been 
reported for other freshwater fishes’. White- 
fish species divergence is characterized by, for 
example, differences in body size and the num- 
ber of ‘gill rakers’ — cartilaginous structures 
that protrude from fish gills and are involved 
in feeding (Fig. 1). Large-bodied white- 
fish, which have fewer gill rakers, typically 
feed from the bottom of lakes and spawn in 
shallow water in winter, whereas smaller 
species, which have more gill rakers, tend to 


the extinction process we usually think 
of, in which deaths outnumber births? Or 
was reverse speciation at play, in which char- 
acteristics that once defined distinct spe- 
cies are merged into a single hybrid species? 

The authors report’ several lines of evi- 
dence suggesting a role for reverse speciation 
in the lakes. First, the severity of eutrophi- 
cation is the best predictor of genetic dif- 
ferentiation of modern whitefish — lakes 
that suffered the greatest eutrophication 
contain species that are the least geneti- 
cally different from each other. Histori- 
cal DNA samples also allowed Vonlanthen 
and colleagues to document a progressive 
reduction in whitefish genetic differentiation 
in one of the lakes (Lake Constance) between 
1926 and 2004. Furthermore, they find strong 
genetic traces of the extinct whitefish species 


Figure 1 | Loss of fish biodiversity through eutrophication. a, Before 

human activity raised nutrient levels in lake waters, the Swiss lakes studied by 
Vonlanthen et al.” were well oxygenated at all depths, and there were diverse 
invertebrate prey communities in both the open water (suggested by other 
studies to be generally smaller prey, represented by the left side of the prey-size 
distribution) and at the bottom (generally larger prey, right side of distribution). 
These resources supported genetically distinct species of whitefish (represented 
as AABB and aabb) with different characteristics, including their body size and 
number of gill rakers — cartilaginous protrusions from the gills. Large-bodied 
whitefish with fewer gill rakers generally fed from the bottom and spawned in 


shallow water, whereas small-bodied species with more gill rakers typically fed 
in open water and spawned much deeper. b, Lake eutrophication led to lower 
oxygen levels, especially at depth, driving deep-spawning species into shallower 
water, where they spawned with other species to form hybrids. Simultaneously, 
the fishes’ prey became less diverse, thereby reducing divergent selection — the 
process by which different ecological niches provide a selective pressure for 
species to have distinct characteristics. ¢, Increased hybridization and reduced 
divergent selection, as well as demographic decline, resulted in extinction of 
the deeper-spawning species, with the remaining species being a genetic hybrid 
(AaBb) and possessing an intermediate number of gill rakers. 
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Coregonus gutturosus in extant sister species, 
implicating hybridization in that extinc- 
tion. The authors also document lessened 
differences in the fishes’ gill-raker numbers, 
a key characteristic, in the most polluted 
lakes. This finding is consistent with the 
hypothesis that eutrophication reduced eco- 
logical opportunity, which in turn weakened 
selection for differences in feeding traits. 

Previous cases of reverse speciation in 
fishes** and birds® have shown that altered 
ecological conditions”* can erode fragile 
reproductive barriers and allow the formation 
of viable hybrids. However, the mechanisms of 
species collapse have often remained obscure. 
The current study is noteworthy because 
it establishes strong links among changed 
environmental conditions, reduced ecologi- 
cal opportunity and reverse speciation. The 
scale of the effect in whitefish, studied over 
decades and across 17 lakes, is also exceptional. 
The work highlights an under-appreciated 
aspect of biodiversity loss — ‘cryptic extinc- 
tion, whereby considerable morphological 
and genetic variability is maintained within 
hybrids, but previously species-specific com- 
binations of these features are lost. 

Cryptic extinction may have a particularly 
high impact on fish biodiversity because indi- 
vidual lakes often contain unique species, and 
fresh waters contain about 40% of all fish spe- 
cies’. But reverse speciation can also occur in 
terrestrial environments, particularly those 
similar to lakes, such as volcanic islands°. 

The major limitation of Vonlanthen and 
colleagues’ study is its correlational nature. 
Whitefish hybridization clearly increased in 
the Swiss lakes as pollution and disturbance 
increased, but factors in addition to those 
highlighted by the authors may have contrib- 
uted to the loss of diversity. One of these is a 
by-product of demographic decline — as one 
species becomes rare, finding mates becomes 
more difficult, and so more frequent hybridi- 
zation would be expected. Other potential con- 
founding processes include the introduction of 
whitefish from hatcheries, overfishing and the 
impact of invasive species. However, despite 
these other influences, a convincing effect of 
eutrophication levels on biodiversity emerges 
from the study’. 

The work raises a number of additional 
important questions. How much, and which 
parts, of the genomes of extant whitefish spe- 
cies are ‘original’ compared with hybrid in 
origin? Which genes are responsible for the 
critical differences between whitefish species, 
and how has the prevalence of variants of these 
genes altered in response to ecological changes? 
In addition, what are the relative roles of the 
two processes of increased hybridization and 
reduced divergent selection (in which the exist- 
ence of multiple ecological niches promotes 
the divergence of distinct species) in driving 
reverse speciation? Genome-wide analyses of 
both historical and modern whitefish samples 


will help to address these questions. 

A more practical concern is what happens 
next. Eutrophication has now been eliminated 
or greatly reduced in most of the lakes studied, 
so they more closely resemble their previous 
state. Can we expect ‘re-speciation, in which 
fishes with characteristics of extinct species 
reappear’? Current theory does not pro- 
vide a clear answer, but suggests that distinct 
species can re-emerge after a brief collapse”’. 
If Vonlanthen and colleagues are correct and 
speciation reversal is an under-appreciated 
threat to biodiversity, we need to understand 
how to prevent and correct the ecological 
changes responsible — and perhaps learn 
how to recognize when it truly is too late. m 
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Intraplate volcanism 


The origin of volcanic activity occurring far from tectonic-plate boundaries has 
been a subject of contention. The latest geodynamic model offers a fresh take on 


the matter. SEE LETTER P.386 


CIN-TY A. LEE & STEPHEN P. GRAND 


hypothesis for the generation of volcanic 

centres that might change our view of how 
plate tectonics influences the distribution of 
volcanic activity on Earth. 

The theory of plate tectonics describes the 
uppermost of Earth’s layers as made up of 
rigid plates, the relative motions of which are 
confined to narrow plate boundaries. The 
boundaries come in three types: divergent, 
where plates move away from one another 
and create systems such as mid-ocean ridges; 
convergent, where one plate slides beneath 
another, forming subduction zones; and 
transform margins, where plates slide past one 
another, as in the San Andreas Fault system. 

Plate tectonics successfully explains most of 
Earth’s geological features. For example, vol- 
canism at mid-ocean ridges can be explained 
by decompression melting associated with pas- 
sive upwelling of hot (asthenospheric) mantle 
in response to plate divergence. Volcanism at 
subduction zones can be described by a com- 
bination of two effects: partial melting of the 
mantle, driven by return flow in the mantle 
wedge overlying the subduction zone, and 
melting-point depression, caused by the influx 
of water released from the descending plate of 
the subduction zone. 

Volcanoes that occur far from plate 


E this issue, Liu and Stegman’ present a 
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boundaries — for example, intraplate mag- 
matism — are more difficult to explain with 
plate tectonics. Some intraplate volcanic sys- 
tems, such as the Hawaiian volcanic chain in 
the Pacific plate and the Yellowstone volcanic 
field in North America, migrate along tracks 
that seem independent of plate-boundary pro- 
cesses. The effusive but short-lived outpour- 
ings of basalts, known as flood basalts, some of 
which are so large that they cover substantial 
areas of continents or even entire plates, are 
also not easily described by the interaction of 
slowly moving plates. 

One popular view is that intraplate 
magmatism is driven by narrow mantle 
upwellings (plumes) originating from a hot 
thermal layer at the core-mantle boundary’. 
Therefore, the expression of plumes at Earth’s 
surface should be independent of plate 
motions’. Flood basalts are thought to record 
the initial impingement of the anomalously 
hot plume head, whereas the volcanic track, 
known as the hot-spot track, records the 
passage of the plate over the plume’ tail’. For 
example, the eruption of the Steens-Columbia 
River flood basalt about 17 million years ago is 
thought to represent the initiation of the cur- 
rently active Yellowstone hot-spot track, and 
so is conjectured to fit into the plume theory*”. 

However, the eruption area of the 
Steens—Columbia River flood basalt is ori- 
ented north-south, perpendicular to the 


Yellowstone track. In addition, the geochem- 
istry of the flood basalt differs from that of 
the Yellowstone volcanics®”, complicating the 
plume hypothesis. Alternatively, the Steens- 
Columbia River flood basalt could be associ- 
ated with extension of the upper plate behind 
the Cascades volcanic arc® (back-arc spread- 
ing). But this phenomenon does not seem to 
explain the sudden appearance of the Steens- 
Columbia River flood basalt. 

In their study, Liu and Stegman’ (page 386) 
propose that the Steens—Columbia River flood 
basalt is a natural consequence of slowing con- 
vergence between the North American plate 
and the ancient Farallon plate. This slow-down 
was presumably associated with the approach 
of a mid-ocean ridge between the Farallon 
and Pacific plates 20 million years ago, now 
manifested as the active Juan de Fuca ridge. 
The authors performed geodynamic calcula- 
tions with initial and boundary conditions 
constrained by observed relative plate motions 
and plate age. They show that stretching and 
eventual tearing of the Farallon plate accompa- 
nied the slow-down of convergence, resulting 
in detachment of the Farallon plate. 

Liu and Stegman find that the model that 
best reproduces the presumed current loca- 
tion of the Farallon plate, as constrained from 
seismic tomography, predicts tearing to have 
begun about 16 million to 17 million years ago, 
when the Steens—Columbia River flood basalt 
initiated. Dynamic pressures generated from 
this tear resulted in rapid mantle upwelling 
through this gap in the slab, driving a 
magmatic flare-up that mimics the structural 
trend of the Steens—Columbia River flood 
basalt (Fig. 1). 

If Liu and Stegman’s model is correct, the 
implication is that some intraplate magmatism 
can be explained by the development of gravi- 
tational instabilities within subducting slabs. 
In their model, thermal upwelling is still 
responsible for flood basalts, but unlike 
traditional plumes, which derive from the 
lowermost mantle, an upper-mantle origin is 
implied. There are, however, some features that 
remain unresolved. For example, the model 
does not provide a good explanation for the 
eastward migration of the Yellowstone hot-spot 
track, the high ratio of helium-3 to helium-4 
in Yellowstone volcanics’ or the presence of a 
seismic-velocity anomaly extending into the 
lower mantle beneath Yellowstone’. And it may 
not explain the isotopic signatures seen in the 
Steens—Columbia River flood basalt. 

If accurate, Liu and Stegman’s model should 
apply to other locations where slab tears have 
occurred. Such a tear clearly happened in 
central California about 20 million years ago, 
because the last remnants of the Farallon plate 
were captured on the coast of California, but 
the rest of the Farallon is no longer present 
beneath the state"”. There is evidence of a flare- 
up in basaltic magmatism east of central Cali- 
fornia, for example on the border of California 
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Figure 1 | Columbia River Gorge, Oregon. Large, effusive outpourings of basalts, such as the 
Steens-Columbia River flood basalt in Oregon and Washington exposed here on the margins of the river, 
are usually attributed to the impingement of thermal plumes arising from the core-mantle boundary. Liu 
and Stegman show’ that the timing and distribution of eruption may instead be related to tears developed 
within subducting slabs. 


and Nevada, during this time. But the magni- 
tude does not seem comparable to that of the 
Steens—Columbia River flood basalt, suggest- 
ing that different boundary conditions might 
need to be considered in the authors’ model. 

In any case, Liu and Stegman’s study is 
pertinent because it draws more attention 
to subducting slabs in generating intraplate 
magmas. The following examples might be 
considered. Where a young oceanic plate is 
subducting, a slab tear, accompanied by large- 
volume magmatic flare-ups, should develop 
because young plates are difficult to subduct. 
This hypothesis may apply to the eastern 
Pacific. By contrast, when an old oceanic plate 
is subducting, a long segment of the slab might 
be expected to stagnate temporarily in the 
transition zone between the upper and lower 
mantle''. The juxtaposition of cold slab mate- 
rial against hot mantle at depth would generate 
small-scale thermal upwellings along the edges 
of the slab’. These upwellings could generate 
widespread basaltic magmatism far from the 
subduction trench, as seen in northeastern 
China. If the edges of the slab are migrating 
relative to the upper plate, hot-spot tracks 
could be generated”. 

We note that all of these upwellings are 
sourced in the upper mantle and likely to be 
superimposed on magmatism associated with 
back-arc spreading; thus a complicated pattern 
of magmatism is expected. Should a subduct- 
ing slab penetrate deep into the lower mantle, 
upwellings might be expected even further 
from plate boundaries. 

In conclusion, there is reason to speculate 
that intraplate magmas might be intimately 
linked to subducting slabs’*"*. In other words, 
it is conceivable that plate tectonics gener- 
ates many intraplate magmas. Differences 
in the magnitude and locations of intraplate 


magmas may simply be controlled by the scale 
of subducting slabs. The debate over whether 
deep-seated thermal plumes exist’” remains 
unresolved because these narrow upwellings 
are difficult to detect. An alternative approach 
is to map out the geometry and length scale of 
subducting slabs, which may be easier to detect 
by various geophysical methods. Liu and 
Stegman’s model shows how downwellings, 
such as subduction, must be considered when 
understanding the origin of upwellings and 
their associated magmatic activities. m 
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A sweet way 
of sensing danger 


Cells can destroy invading bacteria through a digestive process called autophagy. 
A study finds that sugar molecules, exposed by bacterial damage to the cell’s 
membrane, can trigger this process. SEE LETTER P. 414 


JU HUANG & JOHN H. BRUMELL 


he bacterium Salmonella enterica serovar 
| Typhimurium is a leading cause of food 
poisoning and a well-studied model 
pathogen. On ingestion by its host, the microbe 
can penetrate and grow in the gut’s epithelial 
cells, and damage cell membranes. These 
events can lead to inflammation and to the 
pathogen’s spread to other tissues. To combat 
the infection, one defence mechanism available 
to cells is autophagy — a process by which the 
intracellular bacteria are digested in cytoplas- 
mic vesicles known as lysosomes. On page 414 
of this issue, Thurston et al.’ report that sugar 
molecules normally present on the cell’s surface 
can act as a ‘danger signal in the cytoplasm to 
target S. Typhimurium for autophagy. 

Autophagy is a highly regulated process, 
during which cytoplasmic cargoes are captured 
in a double-membrane vesicle, the autophago- 
some. This vesicle then fuses with lysosomes, 
which are loaded with digestive enzymes that 
eventually degrade the vesicle’s contents. Cells 
use autophagy to maintain a balance between 
the synthesis and degradation of their own 
components, and malfunction of the process 
has been linked to human diseases such as 
cancer, neurodegenerative disorders, diabetes 
and inflammatory bowel disease. The autophagy 
system can also target invading pathogens 
such as S. Typhimurium for degradation, 
although the precise mechanisms underlying 
this process are not well understood. 

Early after cell invasion, S. Typhimurium 
resides in a vesicle known as the Salmonella- 
containing vacuole (SCV). Some of the bac- 
teria then use specialized protein machinery 
(called a type III secretion system)’ to gen- 
erate pores in the SCV membrane, through 
which they deliver protein effectors into the 
cell's cytoplasm. The effectors modulate the 
activity of the host’s cellular machinery to 
promote intracellular growth of the pathogen. 
Moreover, damaged SCVs allow some of the 
bacteria to escape and replicate in the cytosol. 
However, the microbes in damaged vesicles 
can also ‘attract’ components of the autophagy 
machinery, such as the protein LC3, as well 
as autophagosomes that engulf the damaged 
SCVs and restrict bacterial growth”? (Fig. 1). 

So how does the autophagy system recognize 


the bacteria in damaged SCVs? Host factors 
such as reactive oxygen species’ and the lipid 
diacylglycerol’ play a part in the recruitment 
of LC3 to SCVs. In addition, the microbes 
attract the protein ubiquitin, which seems to 
recruit autophagy ‘adaptor proteins that, in 
turn, bind to LC3. In particular, the adaptor 
proteins p62, NDP52 and optineurin con- 
tribute to LC3 recruitment to the bacteria®®. 
However, these adaptors are not functionally 
redundant — depletion of any of them impairs 
antibacterial autophagy — and they bind to 
different regions, or microdomains, around 
an individual bacterium’. These observations 
suggest that there may be different signature 
molecules on the microbial surface and/or on 
the damaged SCV that attract different adaptor 
proteins, and that SCV damage is essential to 
exposing these signature molecules. 

To explore potential mechanisms of adap- 
tor recruitment, Thurston et al.' focused on 
galectins, a family of proteins that bind com- 
plex sugar molecules. The authors found that, 
1 hour after invasion by S. Typhimurium, 
galectins were associated with a small popu- 
lation of the bacteria in the host cells. When 
they abolished the expression of a specific 
galectin, galectin 8, intracellular bacteria grew 
at an increased rate. Furthermore, galectin 8 
co-localized with NDP52 on the intracellular 
microbes, although at microdomains distinct 
from those to which p62 or ubiquitin bind. 

Thurston and colleagues carried out experi- 
ments with live cells and purified proteins 
that confirmed a direct interaction between 
galectin 8 and NDP52. Moreover, the authors 
observed that cells lacking galectin 8 failed to 
recruit NDP52 to the bacteria at 1 hour after 
invasion, suggesting that galectin 8 acts as an 
early signal to attract autophagy adaptors. 
The recruitment of LC3 to the intracellular 
microbes was also significantly reduced. When 
the authors artificially targeted NDP52 to 
SCVs in the galectin-8-depleted cells, bacterial 
growth was again restricted. This indicates that 
galectin-8 recruitment of NDP32 is essential 
for the induction of antibacterial autophagy. 

But how do the intracellular bacteria attract 
the sugar-binding galectin? SCV damage 
exposes the cytoplasm to different sugar mol- 
ecules. Although some of these derive from 
the bacteria, there are also host molecules that 
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Figure 1 | Broken vesicles as danger signals. 

a, b, Early after the pathogenic bacterium 

S. Typhimurium invades host cells (a), it resides 

in a cytoplasmic vesicle known as a Salmonella- 
containing vacuole (SCV; b). c, The bacterium can 
damage the vacuole membrane, exposing host sugar 
molecules to the cytoplasm. Thurston et al.’ report 
that the host protein galectin 8 binds to these sugars 
and triggers a process called autophagy, by which 
the invading bacteria are destroyed. Specifically, 
the authors found that galectin 8 recruits another 
protein, NDP52, by direct interaction. NDP52, 

in turn, binds to the protein LC3 and recruits 
other components of the autophagy machinery 

to the damaged SCV. d, Eventually, the bacterium 
is enclosed in a specialized vesicle known as an 
autophagosome, which forms from the isolation 
membrane. e, Other vesicles (lysosomes) 
containing digestive enzymes can then fuse with the 
autophagosome, forming the autolysosome within 
which the pathogen is destroyed. f, The authors 
found that osmotic damage to cytoplasmic vesicles, 
in the absence of bacteria, also exposes host sugars 
that recruit galectin 8. 


were initially present in the vesicle. Thurston 
et al.' report that purified galectin 8 does not 
bind to S. Typhimurium in vitro, which sug- 
gests that the molecule targeted by galectin 8 
is not derived from the microbe. Furthermore, 
the targeting of galectin 8 to intracellular 
S. Typhimurium was severely impaired in host 
cells lacking complex sugar molecules, which 


are normally found on the cell’s surface and 
lining the inside of some vesicles. The authors 
found that other vesicle-damaging pathogenic 
bacteria such as Listeria monocytogenes and 
Shigella flexneri are also decorated by galec- 
tins soon after infection. Moreover, osmotic 
damage of cytoplasmic vesicles in the absence 
of bacteria also resulted in the recruitment of 
galectins to the damaged vesicles. On the basis 
of their observations, the researchers conclude 
that galectins can act as sensors of non-specific 
danger by detecting host sugar molecules that 
are exposed on damaged vesicle membranes. 

Although it has been speculated’ that dam- 
aged SCVs serve as a signal to target bacteria 
for autophagy, Thurston and colleagues’ work 
provides much-needed insight into the mecha- 
nistic details. Their results show that, when the 
microbes try to escape into the cytoplasm by 
disrupting the vesicles, host sugar molecules are 
exposed. Cytoplasmic galectin 8 then functions 
as a danger receptor: it binds to the exposed 
carbohydrates and recruits NDP52, which fur- 
ther attracts LC3 and autophagy machinery to 
the damaged compartment, thus triggering 
antibacterial autophagy soon after infection. 

But the authors also show that recruitment 
of galectin 8 to damaged vesicles is a general 
danger response. Whether autophagy is also 
activated by galectin 8 in any other situations 
in which a cellular organelle is disrupted needs 
to be further investigated. Other galectins 
are recruited to damaged cytoplasmic vesi- 
cles such as SCVs, but at present their role in 
cellular defences to infection is unclear. 

Is sugar exposure the only signal required 
to detect damaged SCVs? Most likely not, 
because NDP352 is only one of the three adap- 
tors required to target S. Typhimurium to 
autophagy. So, the mechanisms that regulate 
recruitment of p62 and optineurin to damaged 
SCVs, and the ways by which the three adaptors 
cooperatively regulate antimicrobial autophagy, 
are exciting questions for future studies. = 
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an old outburst 


Almost two centuries after the eruption of one of the most massive binary 
systems in our Galaxy, light reflected from its surroundings has been detected. 
The observations challenge traditional models for the eruption. SEE LETTER P.375 


NOAM SOKER & AMIT KASHI 


xtremely massive stars — at least a 
Hiss times more massive than the 

Sun — are rare astrophysical objects. 
Historically, they were thought to influence 
their environment through their luminosity 
(equivalent to that of several million Suns) 
and through the explosion, or supernova, that 
marks the end of their life. On page 375 of 
this issue, Rest et al.! describe an analysis of 
light echoes from the nineteenth-century 
‘Great Eruption of n Carinae, one of the 
most massive two-star systems in the Milky 
Way. The work strengthens previous claims 
that such stars have a third mechanism by 
which to release mass and energy into their 
environment. 

From 1838 to 1858, n Carinae was in a con- 
tinuous state of energetic outburst’. At the 
time, however, there was no equipment avail- 
able with which to record its spectrum. The 
spectrum of an astrophysical source, such 
as a star or galaxy, can be thought of as its 
fingerprints: it provides information about 
the source’s temperature, density, velocity and 
chemical composition. 


20 May 2002 


17 December 2002 


8 February 2004 


In an impressive piece of observational 
work, Rest and colleagues’ return, almost 
two centuries later, to the ‘crime scene of 
1 Carinae’s eruption and take its fingerprints. 
The prime suspect in the case is the least 
massive star of the binary system, the compan- 
ion, which stole mass from the more massive 
and more evolved ‘primary’ star. The primary 
itself was in an unstable phase during the 
20-year period of the eruption. The crime scene 
is the ambient dust that reflects light from the 
eruption. This reflected light has taken longer 
to reach Earth than light following a straight 
path and not crossing the dust, and so has 
arrived there only recently — this is why the 
phenomenon is termed a light echo. 

From spectral analyses of light echoes from 
1 Carinae’s eruption, Rest et al. deduce that 
the temperature of the gas ejected during the 
outburst was about 5,000 kelvin. This tem- 
perature, the authors say, is lower than that 
expected from conventional eruption models 
— in which the outflow of the eruption is in 
the form ofa strong stellar wind. They further 
argue that such a temperature fits best with 
a hydrodynamic eruption mechanism. One 
possible model’ for a hydrodynamic eruption 


24 October 2004 


Figure 1 | V838 Monocerotis light echoes. The six panels showa time sequence of light echoes from 
the 2002 eruption of the star V838 Monocerotis, as light from the eruption reached and illuminated 

the star’s dusty surroundings, at increasing distances from the star, and travelled to Earth. The star 

itself is not resolved, but it is located at the centre of the structures in these images. The eruption of 
V838 Monocerotis and that of the nineteenth-century eruption of the two-star system n Carinae studied 
by Rest and colleagues’ have some common properties, despite the different nature of the erupting stars. 
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involves mass transfer from the primary to the 
companion. 

According to this binary model, during the 
20-year Great Eruption, the companion would 
have accreted matter in the form of gas in an 
amount equivalent to several times the mass 
of the Sun. A huge amount of gravitational 
energy would have been released during this 
accretion process, which would have been the 
main energy source of the Great Eruption. 
Furthermore, some of the mass accreted by 
the companion would have been blown by the 
companion itself in two opposing directions, 
leading to the shaping of the Homunculus 
bipolar nebula, which is now observed to sur- 
round the binary system. Most of the mass in 
the nebula was blown directly by the primary 
star. The present masses of the primary and 
companion may be up to 170 and 80 times that 
of the Sun, respectively’. 

During the Great Eruption, n Carinae 
experienced two bright peaks in luminos- 
ity, in 1838 and in 1843 (refs 4, 5). Rest et al.’ 
find that the echoes’ light curves — graphs 
of their intensity as a function of time — are 
consistent with these peaks. The time differ- 
ence between the two peaks corresponds to 
the orbital period of the binary system around 
1840; at present, the orbital period is five and 
a half years**°. The peaks themselves occurred 
when the two stars were closest together in 
their elliptical orbit around each other. Rest 
and colleagues’ analysis of the echoes’ spec- 
tra and light curves lends some support to an 
eruption model in which energy comes from 
mass transfer that is triggered at the stars’ 
closest approach. 

The temperature of about 5,000 K and the 
occurrence of two strong peaks (two weaker 
peaks are recorded historically at around 1849 
and 1854) are reminiscent of the eruptive 
event’ that the star V838 Monocerotis experi- 
enced in 2002 (Fig. 1). One popular model for 
this eruption posits* that a low-mass star of 
about half the mass of the Sun was destroyed 
in a merger with a star about six times more 
massive than the Sun. The accretion of gas 
from the low-mass star onto the surface of 
the more massive star would have been the 
energy source of the eruption. As in the case 
of n Carinae, the star that accreted mass is a 
non-evolved star such as the Sun: it is at an 
evolutionary stage during which nuclear- 
fusion reactions of hydrogen still occur 
in its centre. 

The progenitor of n Carinae’s eruption 
seems to fall into a varied group of systems that 
undergo eruptions powered by impulsive mass 
accretion onto non-evolved stars. The accret- 
ing stars can be very massive, as for n Carinae; 
five to eight times as massive as the Sun, as 
for V838 Monocerotis; or Sun-like stars. This 
heterogeneous group of progenitors might 
also include dying red-giant stars. Accretion 
of mass from a dying red-giant star onto a Sun- 
like star over a time span of 5-50 years could 


lead to eruptions and shape some bipolar 
planetary nebulae. Red-giant stars are Sun- 
like stars in a late phase of evolution, during 
which they become very bright and large. Plan- 
etary nebulae are the last moment of a Sun-like 
star’s glory: they are beautiful shining clouds 
of gas and dust that last for 100,000 years. The 
nebulae are formed from gas that was once 
part of the outer shells of the red-giant star. 
Some of these planetary nebulae are known to 
have been formed over a short period of time, 
and have a structure that is not unlike that of 
1 Carinae. One example of such nebulae is the 
bipolar planetary nebula NGC 6302 (ref. 9). 
As Rest and colleagues’ mention, a few 
more years of data are required to improve the 
echoes’ light curves and to test their consist- 
ency with the historical observations. This 
will definitely help to nail down the origin of 
the eruption event, to find out whether it was 
triggered by mass transfer to the companion or 
by some as-yet-undetermined eruptive event 
in the primary itself, as proposed by some tra- 
ditional models. Although it has been studied 
for more than a century, n Carinae still holds 


STRUCTURAL BIOLOGY 


Ion channel 


several secrets. In the coming years, it is hoped 
that observations with modern telescopes 
will shed more light on this intriguing binary 
system. & 
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in the spotlight 


When expressed in neurons, channelrhodopsin proteins allow the cells’ electrical 
activity to be controlled by light. The structure of one such protein will guide 
efforts to make better tools for controlling neurons. SEE ARTICLE P.369 


OLIVER P. ERNST & THOMAS P. SAKMAR 


(the light-receptive organelle) of a 

motile, photosynthetic alga and putting 
it into the neuron of a living mouse. Now 
imagine exciting the pigment using laser 
light and seeing a reproducible effect of this 
stimulus on the behaviour of the mouse. It 
sounds unbelievable, but this is the basis of 
optogenetics — the combination of optical 
techniques and genetic engineering that allows 
light to control an organism's physiology and 
behaviour’. 

The algal eyespot pigments that facilitate 
optogenetics are proteins called channel- 
rhodopsins (ChRs), and they can be thought 
of as light-activated, nanometre-scale elec- 
trodes. When expressed in cells in vitro or 
in vivo, ChRs target the cell membrane and are 
bound toa chromophore — a kind of molecu- 
lar antenna that absorbs light. Illumination 
of the ChR rapidly causes a flow of cations 
across the membrane. The resulting electrical 
current then gradually turns off and the ChR 


[oe taking a pigment from the eyespot 
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‘recovers, whereupon the whole process can be 
repeated. But the precise mechanism for how 
light opens the channel gate and how the gate 
closes is not known. On page 369 of this issue, 
Kato et al.’ report a high-resolution X-ray crys- 
tal structure of a genetically engineered ChR, 
and use it to propose an explanation for how 
the isomerization of its chromophore causes 
pore opening. 

Although the behaviour of motile algae 
has been studied for decades, it wasn’t until 
2002 that an eyespot pigment of the alga 
Chlamydomonas reinhardtii was identified’ 
as the light-activated protein channelrhodop- 
sin 1. Three years later, ChRs were expressed 
in mammalian neurons and used to facilitate 
the light-induced stimulation of the cells’ 
activity’. Subsequent bioengineering of ChRs, 
enabling optical control of cells on the milli- 
second timescale, together with the develop- 
ment of systems for delivering genes to specific 
cell types, boosted the rapidly growing field of 
optogenetics’. Since then, the use of this tech- 
nology has grown exponentially, with no signs 
of its popularity waning. 


Microbial opsins — the family of light-acti- 
vated proteins that includes channelrhodopsin 
— have been the most commonly used protein 
tools for optogenetics®. The light sensitivity 
and spectral absorption of opsins are due to 
the fact that the proteins are covalently bound 
to their chromophore (all-trans-retinal, a 
derivative of vitamin A). These proteins share a 
common structural plan, which includes seven 
transmembrane helices and a characteristic 
bond (known as a Schiff base) that connects 
retinal to a lysine amino-acid residue in helix 7. 

The first microbial opsins to be identified 
were bacteriorhodopsin and halorhodopsin, 
both of which were found in halobacteria. 
Bacteriorhodopsin (BR) uses light energy 
to pump protons out of cells, whereas halo- 
rhodopsin pumps chloride ions in the oppo- 
site direction. In halobacteria, a complex of a 
sensory rhodopsin and a transducer protein 
mediates phototaxis (the microbe’s movement 
in response to light). But in microalgae, ChR 
performs this task without a transducer, open- 
ing its pore in response to light to generate an 
ion current. It can do this because the light- 
sensitive chromophore and the channel reside 
on the same polypeptide chain. 

What puts these molecular channels and 
pumps in the top drawer of the optogenet- 
ics toolbox for neuroscientists is the fact that 
they allow light to be used as a fairly innocu- 
ous method to change the ion gradient across 
the membrane of a neuron, thereby enabling 
cell depolarization (neuronal activation) or cell 
hyperpolarization (neuronal silencing). The 
latest optogenetic gadgets actually contain two 
microbial opsins linked in tandem, a system 
that allows greater control of ion flow com- 
pared with previously used individual opsins’. 

Although a wealth of structural and 
biophysical studies have improved our under- 
standing of the pump processes for BR and 
halorhodopsin, relatively little is known about 
the gating process of ChR. What is known is 
that the ChR process, like those of BR and 
halorhodopsin, is cyclic, with each cycle last- 
ing tens of milliseconds and involving several 
intermediates. Experiments that introduced 
targeted mutations into ChRs, and analysed 
the proteins’ electrophysiological and spectro- 
scopic properties, have also yielded modified 
ChRs that show altered ion preferences, spec- 
tral properties and pore-opening and -closing 
kinetics. 

Kato et al.” now report that the structure of 
ChR (Fig. 1), although similar to that of BR 
in some respects, also brings a few surprises. 
These unexpected features might explain the 
properties of some of the commonly used 
engineered ChR mutants. When the authors 
superimposed the structure of their ChR on 
that of BR, they found that the transmembrane 
domain and position of retinal are similar. 
But unlike BR, which assembles in trimers 
in the membrane of halobacteria, ChR forms 
a dimer in which the two subunits are in 


Extracellular space 


Intracellular space 


Figure 1 | Structure of a closed light-gated cation 
channel. Channelrhodopsins (ChRs) are proteins 
that form channels in microbial cell membranes. 
The channels form from seven transmembrane 
helices (shown as cylinders) and open in response 
to light, allowing cations to pass through the 
membrane. Their light sensitivity is caused by 

a molecule, all-trans-retinal, that is covalently 
attached to the protein. Kato et al.’ report the X-ray 
crystal structure of a chimaeric ChR constructed 
from two other ChRs, ChR1 and ChR2; the purple 
parts of the protein are from ChR1 and the brown 
parts are from ChR2. The chimaeric ChR forms 
dimeric structures, but only one ChR is depicted. 
The authors find that a negatively charged pore sits 
between helices 1, 2, 3 and 7 and is interrupted by 
two trios of amino acids, which form gates S1 and 
S82. They propose that light-induced isomerization 
of the retinal causes the gates to open, extending the 
pore to the cytoplasm. 


close contact, in agreement with a previously 
proposed structure® obtained using electron 
crystallography. 

Another difference between BR and ChR is 
that ChR has extended amino-terminal and 
carboxy-terminal domains. The N-terminal 
extension contains three cysteine amino-acid 
residues, which form covalent disulphide 
bonds with their counterparts in a second 
ChR molecule, enabling dimerization. The 
C-terminal extension forms a B-sheet at 
the end of the long helix 7, which protrudes 
into the intracellular space. For their study, 
Kato et al. crystallized a truncated version of 
ChR, which — like ChRs used as optogenetic 
tools — consists only of the transmembrane 
part and lacks more than half of the naturally 
occurring protein. So, the B-sheet observed by 
the authors may be a part of the large, mostly 
missing C-terminal domain, which is thought 
to be involved in subcellular localization and 
tethering of the ChR to the algal eyespot. 

Perhaps the most notable difference between 
ChR and BR is that the extracellular ends 
of helices 1 and 2 in ChR are tilted outward 
by about 3-4 angstréms with respect to the 
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analogous helices in BR. Together with helices 
3 and 7, this creates a pore extending halfway 
through the protein. The authors observe that 
the inside surface of the pore contains many 
negatively charged amino-acid residues. Most 
of these are glutamic acid residues from the 
extracellular part of helix 2, suggesting that 
this helix is mainly responsible for defining 
the pore’s conductance and ion selectivity. 
The negatively charged pore elongates into 
a slightly positively charged vestibule in the 
extracellular space. 

The pore created by helices 1, 2, 3 and 7 
ends in the middle of the protein, where reti- 
nal resides. On its intracellular side, the pore 
is constricted by two gates, each consisting of 
three residues from different helices. The three 
residues of the innermost gate form a hydro- 
gen-bonding network next to retinal’s attach- 
ment site, suggesting that helix movements 
caused by retinal’s isomerization might break 
the network and open the gate. The movement 
of helices 1 and 2 might also open a second gate 
further towards the intracellular side, along the 
putative cation channel. Although Kato et al.’ 
argue quite convincingly that the cation-con- 
ductance pore comprises helices froma single 
ChR molecule, an alternative hypothesis is 
that the ChR dimer assembles to form the pore 
using elements from each of the two ChRs’. 
Such an arrangement would be reminiscent of 
the situation reported’ for two-pore-domain 
potassium channels. 

With this first report? of a high-resolution 
crystal structure of an engineered ChR, are 
we at the dawn of the age of ‘structural opto- 
genetics’? Structure-based design, combined 
with the discovery of other useful microbial 
ChRs, might produce optogenetic tools that 
have highly specific properties tailored for 
the study of individual cell types and signal- 
ling processes. And although it is too early to 
say “take photons, not drugs’, the potential for 
optogenetics to revolutionize neuroscience and 
neurology is now even more in the spotlight. m 
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Ithough proponents of RNA might beg to 
At in the hierarchy of popular interest, DNA 

has historically held more sway. Being able 
to decipher genomes was seen as a milestone on the 
way to understanding life itself. What genome-wide 
RNA sequencing studies have revealed, however, is the 
unexpected complexity of RNA species encoded by 
DNA, most of which do not code for a protein. We now 
appreciate that such non-coding RNAs exert important 
regulatory controls on many biological processes. 

The reviews in this Insight illustrate some of these 
principles. RNA is synthesized as a single-stranded 
molecule, but it is able to base-pair with itself, other RNA 
molecules or DNA. Hashim Al-Hashimi and colleagues 
discuss how secondary and tertiary structures of RNA are 
influenced by external cues to elicit a specific functional 
output. The cell exploits this dynamism to regulate 
processes such as transcription, post-transcriptional 
processing and translation. Jennifer Doudna and 
colleagues review a microbial adaptive immune 
system, CRISPR (clustered regularly interspaced short 
palindromic repeat). This system incorporates small 
pieces of invading viral or plasmid sequences into the 
bacterial genome as CRISPR loci; when future invasions 
occur, the expressed CRISPR RNAs recognize the 
foreign nucleic acids and mediate their degradation. The 
physiological function of many long non-coding RNAs 
remains undetermined, but Mitchell Guttman and John 
Rinn propose a model in which these molecules act in a 
modular fashion to bind different proteins or hybridize 
to various DNAs or RNAs; this modularity expands the 
scope of a single RNAs function. Finally, Amaia Lujambio 
and Scott Lowe highlight the role of another class of 
much shorter, non-coding RNA — microRNAs — in 
cancer development and suppression, and as a target for 
therapeutic intervention. 

We hope these reviews provide a flavour of how the 
inherent properties of RNA make it a robust species to 
regulate cellular processes. 


Angela K. Eggleston, Alex Eccleston, 
Barbara Marte & Claudia Lupp 
Senior Editors 
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Functional complexity and regulation 
through RNA dynamics 


Elizabeth A. Dethoff', Jeetender Chugh’, Anthony M. Mustoe’ & Hashim M. Al-Hashimi* 


Changes to the conformation of coding and non-coding RNAs form the basis of elements of genetic regulation and provide 
an important source of complexity, which drives many of the fundamental processes of life. Although the structure of RNA 
is highly flexible, the underlying dynamics of RNA are robust and are limited to transitions between the few conformations 
that preserve favourable base-pairing and stacking interactions. The mechanisms by which cellular processes harness 
the intrinsic dynamic behaviour of RNA and use it within functionally productive pathways are complex. The versatile 
functions and ease by which it is integrated into a wide variety of genetic circuits and biochemical pathways suggests 
there is a general and fundamental role for RNA dynamics in cellular processes. 


prompted researchers to ask the question: how do ligands reach 

the deeply buried haem group centre? This simple, but power- 
ful, observation has inspired decades of investigation into the dynamic 
behaviour of proteins, so that we now know protein structures are in 
constant motion, and that these fluctuations in structure are crucial 
to, and sometimes drive, function. Early X-ray structures of RNA con- 
tained indications of the importance of conformational dynamics: large 
changes in the helical arms of transfer RNA were observed on the bind- 
ing of (RNA synthetase’, and changes in the conformation of ribozymes 
needed to be invoked to envision catalytically active states’ *. However, 
no one could have anticipated the existence of new genetic circuits that 
are based on RNA conformational switches, or that the ‘acrobatic’ nature 
ofa biopolymer that consists of only four chemically similar nucleotides 
would be at the centre of a complex macromolecular structure such as 
the ribosome. 

The dynamic changes that occur in the structure of RNA serve an 
ever-increasing range of functionality that generally follows a common 
two-step process (see Supplementary Information for more reviews on 
RNA dynamics). The process involves a cellular signal that triggers 
RNA dynamics, which are then transduced into a specific biological 
output. This Review provides a critical account of RNA dynamics as a 
regulatory mechanism and source of functional complexity. We review 
the known dynamic properties of RNA structure and emphasize the 
unique properties that allow large changes in structure to take place in 
a biologically specific and robust manner. We then examine the wide 
range of cellular inputs used to interface with RNA dynamics and the 
various mechanisms that are used to guide the dynamics to achieve a 
broad spectrum of functional outputs. 


A nalysis of the first X-ray structure of the protein myoglobin’ 


RNA free-energy landscape 

It is important to distinguish between the two types of RNA dynam- 
ics: ‘equilibrium fluctuations’ and ‘conformational transitions: Equi- 
librium fluctuations are related to the thermal activated motions that 
occur in RNA. Conformational transitions arise when cellular cues, 
such as an increase in the concentration of a metabolite, create a non- 
equilibrium state that then relaxes back to equilibrium. This Review 
is focused principally on conformational transitions because of their 
dominant role in regulatory mechanisms; however, the two motions 
are intricately related, as highlighted by studies of RNA and protein 


dynamics®”, This, and other aspects of RNA dynamic behaviour that are 
relevant to function, is best understood by looking at the free-energy 
landscape of RNA*”. 

The free-energy landscape specifies the free energy of every possible 
RNA conformation (Fig. 1a). Equilibrium fluctuations correspond to 
the spontaneous jumps that occur between various conformers along 
the free-energy landscape. The population of a given conformer 
depends on its free energy, whereas the transition rate between con- 
formers depends on the free-energy barrier of separation (Fig. 1a). 
Conformational transitions arise when cellular cues perturb the free- 
energy landscape, which leads to a redistribution of conformational 
states (Fig. 1a). The RNA free-energy landscape is punctuated by deep 
local minima, or conformational wells, in which conformations within 
a well are highly similar and conformations from different wells are 
structurally distinct. These are the conformations that are significantly 
sampled by equilibrium motions and are stabilized by cellular cues to 
effect conformational transitions” (Fig. 1a). For example, the degen- 
eracy of base-pairing and stacking interactions, together with the high 
stability of RNA duplexes, results in deep local minima that corre- 
spond to different but energetically equal secondary structures that are 
separated by large kinetic barriers’ (Fig. 1b). As few as two secondary 
structures can dominate the RNA dynamic landscape because the loss 
of energy that accompanies the disruption of just one base-pair can 
markedly destabilize alternative conformations. In addition, RNA ofa 
given secondary structure can undergo more facile dynamic excursions 
in tertiary structure, which involve smaller energetic barriers. These 
dynamics are commonly dominated by large changes in the relative 
orientation of helical domains, which carry motifs involved in tertiary 
contacts, and occur around flexible pivot points consisting of bulges, 
internal loops and higher-order junctions (Fig. 1c). 

Although these excursions can lead to very large changes in ter- 
tiary structure, they are limited to a narrow set of conformations. For 
example, calculating the set of conformations that are accessible to 
two helices that are connected by a three-residue bulge reveals that 
the interhelical bend angle, when combined with interhelical twist- 
ing, can range from 0° to 180°. Despite this large accessible range, 
the connectivity constraints that are imposed by the bulge junction 
and the steric forces that act on the two helices direct changes in 
the interhelical orientations along a highly directional pathway and 
therefore restrict the conformational space to less than 20% of what 
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Figure 1 | Shape and form of RNA dynamics. a, The secondary and tertiary 
RNA conformations of different low-lying energy states are shown above the 
RNA free-energy landscape (green line). The relative populations of each 
conformation are shown within the landscape (red balls). Cellular effectors 
(bolts) can modify the energy landscape to favour an alternative secondary 
structure (top), or preferentially stabilize an alternate tertiary conformation 
(bottom). b, Exchange between alternative, isoenergetic secondary structures 
(A and B) that are separated by large energetic barriers owing to disruption 
of base pairs in the transition state’’. c, The accessible range of interhelical 


is theoretically possible"®'*"'° (Fig. 1c). In addition, owing to the high 


stability of duplexes, residues participating in non-canonical base pairs 
can loop-out from intra- to extrahelical conformations without sig- 
nificantly disturbing the structure of the flanking helices’””* (Fig. 1d). 
Precise control over these dynamics is encoded within the sequence, 
and small sequence variations can greatly alter the relative populations 
of different RNA structures and their rates of interconversion’””’. For 
example, distinct interhelical orientations can be sampled by chang- 
ing the length and asymmetry of junctions'”’*"’, and the tendency of 
residues to loop-out can be modulated on the basis of sequence-specific 
stacking interactions””' (see Supplementary Information for links to 
movies and animations of experimentally determined RNA dynamics). 

These features can help to explain the three remarkable aspects of 
RNA conformational transitions that are of fundamental importance 
for regulatory functions. First, the landscape is hierarchical due to 
the height of the energy barriers that separate alternative secondary 
structures. Changes in tertiary contacts rarely involve changes in the 
secondary structure and the two types of conformational changes can 
be used to serve different functions. Throughout this review, we will 
use ‘secondary’ and ‘tertiary’ conformational changes to distinguish 
between these two types of dynamics. Second, the limited landscape 
of energetically favourable conformations allows RNA to undergo 
very large changes in structure, but to be directed towards a very spe- 
cific set of conformations from a vast number of possibilities. Third, 
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conformations for an RNA two-way junction consisting ofa trinucleotide bulge, 
with the possible paths of the bulge, which were excluded during the modelling, 
illustrated in red'*"’. The allowed range of conformations is restricted towards 

a specific and directed conformational pathway by steric and stereochemical 
forces. The structure is rotated 90° to illustrate the bending (left) and twisting 
(right) motion. d, Flipping out of a residue (red) participating in a non- 
canonical base pair within an RNA internal loop is illustrated, progressing from 
an intrahelical stacked to an extrahelical unstacked conformation. The motion 
occurs without perturbing flanking Watson-Crick pairs (green). 


there is increasing evidence that RNA dynamics are determined by the 
underlying RNA free-energy landscape, and to lesser extent by cellular 
cues’”””*, Thus, conformational transitions can be considered perturba- 
tions that guide pre-existing equilibrium fluctuations towards specific 
functionally productive pathways. In this way, even an imperfect force 
or cellular signal will drive changes in the RNA structure along a prede- 
termined pathway, which makes the transitions highly robust. 


Triggers of RNA conformational transitions 

RNA dynamics can be triggered by a remarkably diverse set of molecular 
effectors and environmental cues through several different mechanisms. 
This provides many different points of entry for integrating RNA confor- 
mational transitions into biological circuits and biochemical pathways. 


Specific protein binders 

The most common effectors are proteins that bind their target RNA 
specifically through well-defined structural features, thereby stabilizing 
one or a subset of conformations from the pre-existing energy land- 
scape. For example, the mitochondrial tyrosyl-tRNA synthetase CYT- 
18 from Neurospora crassa binds specifically to group-I introns (a class 
of large self-splicing ribozymes that catalyse their own excision from 
messenger RNA, tRNA and ribosomal RNA precursors) and stabilizes 
the conformation required for catalytic activity™*. Protein binding often 
leads to large changes in the overall orientation of RNA helices around 
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junctions such as bulges”, three-way junctions” and other motifs such 
as the K-turn””. For example, the spliceosomal U4 small nuclear RNA 
(snRNA) undergoes a sharp transition in the interhelical bend angle, 
from approximately 69° to 25°, around a K-turn motif, when it binds 
to its cognate protein target”* (Fig. 2a). These changes in interhelical 
conformation are driven in part by nonspecific electrostatic interactions 
between basic amino acids and the high negative-charge density that 
builds up at interhelical junctions, and are often observed as equilib- 
rium dynamics in the absence of an effector” *'. For example, unbound 
HIV-1 transactivating response RNA (TAR) dynamically samples the 
many different interhelical orientations that are observed when it is 
bound to seven distinct ligands, including peptide mimics of its cognate 
protein, Tat*' (Fig. 2b). 

In an increasing number of cases, protein binding does not involve 
the stabilization of a specific minimum of the RNA free-energy land- 
scape. Instead, binding selectively lowers the surrounding energy bar- 
riers to accentuate or alter the equilibrium dynamics of the RNA. For 
example, binding of the U1A protein to its cognate RNA target does 
not cause the pre-existing equilibrium interhelical motions to stop, 
but rather induces mobility in regions of the RNA that are in direct 
contact with the protein”. The CBP2 protein from yeast mitochondria 
binds specifically to the bI5 group-I intron and activates large-scale 
RNA equilibrium motions”. Even simple small-molecule ligands lead 


to the reorganization of the TAR RNA equilibrium dynamics™. These 
observations highlight the importance of embracing a broader view of 
trigger factors as elements that perturb the entire energy landscape and 
guide RNA dynamics rather than simply stabilize a single conformation 
from a dynamic range. 


RNA chaperones and helicases 

As is often the case in RNAs that possess alternative secondary struc- 
tures, the large energy barriers associated with base-pair melting can 
limit the dynamics between RNA conformational wells. In this way, the 
RNA can become kinetically trapped in a metastable, non-equilibrium 
conformation. In response to this, a variety of proteins have evolved that 
possess the RNA ‘chaperone’ activity needed to efficiently drive RNA 
secondary structural-transitions over the large energy barriers**”®. For 
example, the HIV nucleocapsid protein uses non-specific interactions 
between the RNA and protein to destabilize the RNA helices”. This 
lowers the energetic barrier to conformational exchange, accelerating 
relaxation to equilibrium and allowing metastable RNAs to convert to 
conformations that are more thermodynamically favourable. 

Other RNA chaperones, such as RNA helicases, help RNA traverse 
high energy barriers by unwinding helices and disrupting RNA struc- 
ture, as well as promoting the formation of new RNA duplexes to accel- 
erate conformational transitions in RNAs and ribonucleoprotein (RNP) 


Protein-bound 


U4 - U6 @) 


_ stem Il 


c 
5 


DExD or H-box 
QO helicases 
——_> 


5° exon 1 
; TOO 5 
1 U4 - U6 1 
HT I ' 
5’ 


U2 - U6 


5) 


U6 stem stem | 


“exon 


U2 - U6 
stem Il 
ATP 


3’ exon 


RNA 


d Terminator 
stem 
5° 
f g (i 


—-» Transcription 
on 


-|3" exon | 


Initial assembly 


Active spliceosome 


eo QO 
Q CQ. Off Off 


Transcription On switch switch 
{ AdoCbl 


off switch 


a 
fs 
Figure 2 | Triggering RNA conformational transitions. a, Conformational 
changes in the spliceosomal U4 snRNA K-turn motif (Protein Data Bank (PDB) 
ID 2KR8) triggered when it binds to a complex of the human protein PRP31 
and the 15.5K protein (PDB ID 20ZB)™. b, Similarity between the TAR RNA 
interhelical conformations that are triggered by binding to small molecules, Tat 
peptide derivatives and divalent ions (grey helices) and those that are sampled by 
equilibrium dynamics (green helices labelled as 1,2 and 3) in the unbound state 
shown as a horizontal and vertical view. The path of helix II (HII) as it moves from 
one unbound, equilibrium conformer to the next is shown by the orange arrows. 
HL, helix 1. Figure modified, with permission, from ref. 31.c, RNA conformation 
transitions during spliceosome assembly on pre-mRNA (dashed line) in the 
presence of DExD or H-box helicases and ATP. Sections are colour-coded to 
indicate base-pairing after helicase action. d, The RNA structure is modulated 
by steering of the co-transcriptional folding pathway. The adenine transcription- 
terminating riboswitch is a typical example. The progression of co-transcriptional 
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folding with and without the ligand (adenine) is shown. Adenine binds to the 
apatamer domain and stabilizes the structure, allowing transcription to be turned 
on. RNAP, RNA polymerase. e, Two examples of tandem riboswitch architectures. 
Left, cooperative binding of glycine by the glycine riboswitch using tandem 
aptamer domains and one expression platform. Right, tandem SAM and AdoCbl 
riboswitches in which either of the two ligands triggers the conformational switch 
and yields an output of gene repression. Sequences that can form transcription 
terminator stems are shown in red. f, Conformations of HDV ribozyme pre- 
cleavage (PDB ID 1VC7)” and post-cleavage (PDB ID 1DRZ)” states. Enlarged 
details of the catalytic core (dashed box) of the two structures are shown, with the 
bound substrate (green) and the magnesium ion (yellow) present only in the pre- 
cleavage state. g, Melting of the secondary structure around the ribosome-binding 
site of virulence genes in the pathogen is triggered by an increase in temperature 
that makes the Shine-Dalgarno sequence (SD, blue) available for ribosome 
binding and translation initiation. 
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complexes*’. These chaperone proteins are important for remodelling 
the structure of RNA and RNP complexes because they can anneal 
or unwind RNA strands depending on the environmental cues”. For 
example, helicases are essential in the assembly of the spliceosome, 
which is a complex RNP that consists of five RNAs and multiple proteins 
that catalyses excision of introns from a nuclear pre-mRNA“. Assem- 
bly proceeds through a series of transitions that involve the melting and 
annealing of RNA duplexes that are catalysed by DExD/H-box ATPase 
helicases (Fig. 2c). For example, the U4 RNA escorts the U4-U6-U5 
triple small nuclear RNP complex (tri-snRNP) to the pre-mRNA, but is 
subsequently released by the DExD/H-box helicase Brr2, which cataly- 
ses the melting of the two stems within U4 and U6. This frees the U6 
stem to base-pair with U2 snRNA and leads to a new RNA structure 
that is required for the first transesterification reaction” (Fig. 2c). In 
addition, DExD/H-box proteins are involved in the release of mRNA 
produced in pre-mRNA splicing reactions. For example, the DEAH-box 
splicing factor Prp22 is deposited on spliced mRNA downstream of the 
exon-exon junction and catalyses the disruption of contacts between 
mRNA and U5 snRNP, thereby releasing the spliced mRNA from the 
U5-U6-U2 spliceosomal assembly”. In another example of the vari- 
ety of functions of RNA chaperones, the DExD/H-box protein CYT- 
19 unfolds native and misfolded conformations of a group-I catalytic 
RNA in an ATP-dependent process. A large free-energy gap between the 
native and misfolded conformers directs CYT-19 to unfold misfolded 
conformers more frequently than native conformers. In the process, 
CYT-19 redistributes the two conformation populations, which allows 
native RNA to populate a wider range of conformations than would 
otherwise be possible“. 


Metabolites and physiochemical conditions 
Another ingenious strategy is used to modulate RNA structure in 
response to a wide range of metabolite-based effectors, including small 
molecules (such as amino-acids, coenzymes and nucleotides***’) and 
changes in physiochemical conditions (such as magnesium ion con- 
centration®* and pH”). It would be difficult, if not impossible, for these 
smaller effectors and cellular cues to possess the chaperone activity 
needed to efficiently drive secondary structural transitions over the asso- 
ciated large energy barriers. Instead, this strategy operates on the initial 
RNA-folding process itself, intervening while the energy barriers are 
still low. Specifically, these effectors and cues act by directing the RNA to 
different folding pathways during RNA co-transcriptional folding. This 
process is made possible by the unidirectional and comparatively slow 
rate with which RNA is transcribed from the 5’ to the 3’ direction relative 
to RNA folding and effector binding. Each pathway favours one of two 
distinct secondary structures, where each secondary structure is associ- 
ated with an alternative biological outcome (Fig. 2d). This trigger mecha- 
nism is implicated in a growing list of other RNA switches, although it has 
been best described for metabolite-sensing riboswitches*”». 
Riboswitches are RNA-based genetic elements typically embedded 
in the 5’ untranslated region of bacterial genes that regulate expression 
of metabolic genes in response to changes in cellular metabolite con- 
centration”. Ina prototypical metabolite riboswitch, a metabolite, 
such as adenine, binds to the aptamer domain with exceptional affinity 
and selectivity. This stabilizes an otherwise shallow energy well, which 
induces a redistribution of the aptamer conformational states towards 
one state that, in most riboswitches, sequesters an RNA element into a 
helix of the aptamer domain® (Fig. 2d). In turn, the unavailability of the 
RNA element changes the folding pathway of a downstream decision- 
making expression platform, directing it towards structures that turn 
off (and in some cases, turn on) gene expression, either by forming a 
transcription-terminating helix (Fig. 2d) or by sequestering the Shine- 
Dalgarno sequence (a ribosome binding site located eight base pairs 
upstream of the start codon in mRNA), thereby inhibiting translation. 
This system also keeps the number of spontaneous conformational 
transitions, or premature switching in the absence of ligands, to a mini- 
mum because very large energy barriers separate the two alternative 
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secondary structural forms of the expression platform. 

More complex functionality can be achieved by coupling multiple 
riboswitches together. For example, the glycine riboswitch uses two 
aptamer domains in tandem to cooperatively bind glycine, thereby 
increasing responsiveness to changes in ligand concentrations” 
(Fig. 2e). The tandem arrangement of two entire riboswitches that 
respond to two distinct ligands allows the construction of more sophis- 
ticated genetic circuits such as two-input Boolean NOR logic gates, in 
which either of the two ligands can trigger the conformational switch 
and yield an output of gene repression” (Fig. 2e). In another example, 
the c-di-GMP-sensing riboswitch and a GTP-dependent self-splicing 
group-I ribozyme in the 5’ untranslated region of a putative Clostridium 
difficile virulence gene work in tandem to regulate translation”. In the 
presence of c-di-GMP and GTP, the riboswitch and ribozyme forma 
structure that stabilizes a 5’ splice site, and the ribozyme self-splices to 
yield an RNA transcript with a perfect ribosome-binding site located 
upstream of the start codon. Conversely, in the presence of GTP alone, 
alternative base pairing between the riboswitch and ribozyme occurs 
to form a structure that promotes splicing at an alternative site, which 
results in a splicing product without a ribosome-binding site, and thus 
downregulates translation. This RNA arrangement represents the first 
natural example of an allosteric ribozyme. 


Chemical reactions 

Chemical reactions, such as cleavage of the RNA phosphodiester back- 
bone, can also reshape the underlying RNA energy landscape so that 
a state that was previously in equilibrium becomes a non-equilibrium 
state, which triggers changes in RNA secondary and tertiary structure. For 
example, X-ray analysis of the structures of precursor and product states 
of the hepatitis delta virus (HDV) ribozyme, which catalyses site-specific 
self-cleavage of the viral RNA phosphodiester backbone, reveal changes 
in the local arrangement of catalytic groups, as well as the ejection of a 
catalytically important magnesium ion”. These conformational changes 
may help to accelerate product release (Fig. 2f). Another example is 
seen in the secondary structural switch triggered by cleavage of the 3’ end 
of the pre-18S rRNA during eukaryotic ribosome maturation, which is 
used to enforce a sequential order to the maturation process”. 


Thermal and mechanical triggers 

Other energy-dependent processes can induce the complete ‘melting’ of 
RNA helices. RNA thermosensors alter expression of genes during heat- 
shock response and pathogenic invasion in response to increases in tem- 
perature” (Fig. 2g). For example, when Listeria monocytogenes invades an 
animal host, the pathogen enters a warmer environment, which activates 
a thermosensor located at the 5’ untranslated region of the prfA mRNA”. 
The higher host temperature causes a shift in the energy landscape from 
one that favours the formation of the thermosensor hairpin to one that 
favours the melted, single-stranded conformation. This melting transi- 
tion exposes ribosome-binding sites, which are required for translation. 
Mechanical triggers can also induce the unfolding of RNA hairpins. One 
example is translation-induced unfolding of mRNA hairpins, which is 
thought to slow the rate of ribosome elongation to allow the folding of 
autonomous-folding proteins and protein domains™. 


Functions of secondary structural transitions 

Secondary structural transitions are widely used in gene regulation as 
binary switches that are activated by cellular cues. The switch can be 
transduced into a range of outputs by sequestering or exposing key RNA 
regulatory elements. 


Transcription 

Many RNA switches regulate gene expression at the transcriptional 
level by producing transcription-terminating helices. In addi- 
tion to metabolite-sensing riboswitches, other RNA switches use 
the same strategy to regulate gene expression in response to more 
complex molecules***’. For example, in the T-box mechanism, 
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non-aminoacylated or uncharged tRNA can activate transcription of 
the cognate gene that encodes its aminoacyl-tRNA synthetase. The 
interaction between the acceptor end of an uncharged tRNA and resi- 
dues in the antiterminator bulge in the 5’ untranslated region of the 
mRNA (Fig. 3a) promotes formation of an antiterminator helix dur- 
ing co-transcriptional folding that allows transcription to continue. 
However, the acceptor end of aminoacylated or charged tRNA cannot 
interact with the antiterminator helix residues, which results in forma- 
tion of the more stable terminator stem that aborts synthetase gene 
transcription” (Fig. 3a). Only a few proteins have been identified that 
modulate transcription by influencing folding of transcription-termi- 
nating helices. One example is the tryptophan-activated RNA-binding 
attenuation protein (TRAP), which binds trp mRNA to regulate gene 
expression at both the transcriptional and translational level by sev- 
eral processes (for example, promoting the formation of a terminator 
hairpin that terminates transcription™). 


Translation 

There is an increasing list of protein- and RNA-triggered®' RNA 
switches that regulate translation by sequestering or exposing ribo- 
some-binding sites or by affecting the structure of ribosomal RNA, 
and therefore blocking translation. For example, a protein-dependent 
RNA switch has recently been identified in the 3’ untranslated region 
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of VEGFA mRNA in myeloid cells that regulates translation of VEGFA 
in response to proteins associated with two disparate stress stimuli 
(Fig. 3b). The interferon-y (IFN-y)-activated inhibitor of translation 
(GAIT)-complex binds a structural GAIT element within a family of 
inflammatory mRNAs and silences their translation by promoting the 
formation of a translational-silencing (TS) conformer®. During oxida- 
tive stress, the heterogeneous nuclear ribonucleoprotein L (hnRNP L) 
overrides GAIT silencing by triggering a secondary structural RNA 
switch to a translation-permissive (TP) conformer, in which the GAIT 
element is occluded. The RNA alternates between two mutually- 
exclusive conformers in response to the binding of the GAIT complex 
or hnRNP L, thereby functioning asan AND NOT Boolean logic-gate 
switch in which the presence of one protein, but not the other, yields 
an output of gene repression (Fig. 3b). 


Post-transcriptional processing 

An increasing number of RNA switches are involved in regulating post- 
transcriptional processing; for example, splicing, gene silencing by 
microRNA (miRNA) and RNA editing. Although the detailed mechan- 
ics of many of these systems are still unknown, in all cases the RNA 
switch exposes, occludes or modulates the structure of the processing 
sites to regulate post-transcriptional processes. For example, one of 
the thiamine pyrophosphate riboswitches discovered in eukaryotes 
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Figure 3 | Functional outputs of secondary structural changes. a, 
Transcriptional activation of the aminoacyl-tRNA synthetase gene by 
uncharged tRNA (no aminoacylation (aa)). Binding of uncharged tRNA 
induces formation of an antiterminator helix during co-transcriptional 
folding”. b, Translation control of VEGFA expression through a dual protein- 
dependent RNA secondary structural switch that responds to interferon-y 
(IFN-y) by binding the IFN-y-activated inhibitor of translation (GAIT) 
complex (green) to form a translational-silencing (TS) conformer (on the 
left) and to hypoxic stress that results in hnRNP L binding and causes a 
switch to a translation permissive (TP) conformer (on the right). c, Thiamine 
pyrophosphate (TPP) riboswitch-regulated alternative splicing and gene 
expression of NMT1. In the absence of TPP, the aptamer domain base-pairs 
(red dotted line) to the sequence surrounding a proximal 5’ splice site (SS, 
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shown as coloured diamonds: green, activation or red, repression) to block 

it from the SS machinery. Instead, a distal SS is selected. On binding TPP, the 
aptamer domain undergoes a conformational change to expose the second 
proximal 5’ SS. The resultant spliced mRNA contains decoy upstream open 
reading frames (uORFs), thus reducing expression of the NMT1 ORE d, 
Pumilio protein-mediated mRNA secondary structural switch controls 
accessibility of microRNA-binding sites and regulates expression of p27 
protein. Binding of PUM1 induces a conformational change to expose the 
miR-211 and miR-222 binding site to allow p27 silencing. RISC, RNA-induced 
silencing complex. e, Secondary structural switch couples dimerization 

and diploid genome packaging of the Moloney murine leukaemia virus. 
Dimerization leads to a coupled frame-shift that exposes nucleocapsid protein 
binding sites (green) required for genome packaging. NC, nucleocapsid. 
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regulates alternative splicing™ (Fig. 3c). Here, changes in the secondary 
structure sequester or expose splice sites (Fig. 3c). An RNA switch has 
recently been identified in the 3’ untranslated region of p27 mRNA that 
simultaneously sequesters both an miRNA target site from cleavage by 
the RNA-induced silencing complex (RISC) and a Pumilio-recognition 
element (PRE), which binds a Pumilio RNA-binding protein (PUM1 ae 
Binding of PUM1 to the PRE region triggers a secondary structural 
switch that exposes the miRNA target site, leading to miRNA silenc- 
ing (Fig. 3d). In another example, HDV genotype III editing levels 
are determined by a pre-existing equilibrium between two second- 
ary structures of the antigenome RNA, involving a kinetically trapped 
conformation anda thermodynamically more favourable state. These 
initial discoveries suggest RNA switches have a range of functions in 
post-transcriptional processing. 


Viral replication 

RNA genomes of retroviruses take advantage of RNA secondary struc- 
tural switches to transition between the different functions required 
for the various steps of the viral replication cycle. For example, there is 
evidence that the 5’ untranslated region of the HIV-1 genome can form 
two mutually exclusive secondary structures: a metastable branched 
multiple-hairpin conformation, which is involved in dimerization and 
packaging; and a more energetically favourable long-distance interac- 
tion conformation, which is involved in transcription and translation. 
The transition from the long-distance interaction to the branched 
multiple hairpin conformation is catalysed by the RNA chaperone 
nucleocapsid protein™. 
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Figure 4| Functional outputs of tertiary conformational changes. a, 
Different X-ray structures of tRNA!" in the unbound state (black, PDB 

ID 1EHZ), in complex with RNaseP (blue, engineered anticodon stem 
removed, PDB ID 3Q1Q), the ribosome in the P/E state (green, PDB ID 
3R8N), isopentenyl-tRNA transferase (red, PDB ID 3FOZ), and phenyalanyl- 
tRNA synthetase (yellow, PDB ID 1EIY). The structures are superimposed 
by the acceptor stem. b, Hierarchical assembly of the central domain of 

the 30S ribosomal subunit by successive protein-induced changes in the 
conformation of 16S rRNA. S15 changes the orientation of the helical 
domains to favour the binding of S6 and S18. c, Enzymatic cycle of the 
hairpin ribozyme. d, Ratcheting motions of the ribosome seen through X-ray 
crystallography. The degree of 30S subunit atomic displacement between 

the unratcheted and R, ratcheted states with the 50S subunit as a reference 
(not shown) are colour-coded by A. Atomic displacement vectors and 
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RNA switches can also couple distinct processes within a given step. 
For example, an RNA switch is used to couple dimerization and selec- 
tive encapsidation of two copies of the Moloney murine leukaemia 
virus RNA genome. Dimerization of two RNA genomes induces a shift 
in the base-pairing pattern within the y-RNA packaging signal, which 
exposes conserved UCUG elements that bind the nucleocapsid protein 
with high affinity, thereby promoting genome packaging” (Fig. 3e). 
These elements are base-paired and bind nucleocapsid protein weakly 
in the monomeric RNA (Fig. 3e). 


Functions of tertiary conformational changes 

RNA tertiary conformational changes can range from large global 
changes in the orientation of helices to more subtle local changes in 
the structure of motifs that are involved in tertiary interactions. These 
conformational transitions allow RNA molecules to bind adaptively to 
a wide range of molecular partners and can help to direct the assembly 
of RNPs. 


Polyvalent binding 

Some of the first solved structures of RNA-protein complexes revealed 
a remarkable ability of RNA to undergo adaptive changes in confor- 
mation*” that had the potential to allow the optimization of intermo- 
lecular interactions with disparate targets. In a classic example, these 
conformational changes allow tRNAs to interact with many diverse 
partners, including ribonuclease P (RNase P), various nucleotide 
modifying enzymes, tRNA synthetase, EF-Tu, the ribosome and other 
RNA elements. High-resolution structures of (RNA, tRNA-protein and 
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arrows (on the right) indicate the direction of the change. Figure reprinted, 
with permission, from ref. 82. e, The free-energy landscape of ribosomal 
ratcheting, as calculated from subclassification of cryoelectron microscopy 
particles. Movements of the 30S subunit body and head domains in relation 
to the 50S subunit are shown in units of degrees and arbitrary units (a.u.), 
respectively, with corresponding tRNA translocation intermediates (Prel 
and so on) outlined in black. Figure reprinted, with permission, from ref. 

84. f, Dynamics of the 50S ribosomal L1 stalk monitored by single-molecule 
fluorescence resonance energy transfer (smFRET). Representative smFRET 
trace (top) and histogram (bottom left) of the L1 stalk dynamically sampling 
open and closed conformations in A- and P-site tRNA-bound ribosome 
complexes. Translocation by EF-G and tRNA occupation of the E- and P-sites 
causes the L1 stalk conformation to shift dramatically (bottom right). Figure 
modified, with permission, from ref. 100. 
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tRNA-RNP complexes show that binding is often accompanied by sig- 
nificant conformational changes, which range from the reorientation of 
helical domains to finer changes in local structure, all of which optimize 
intermolecular interactions®™ (Fig. 4a). 


Ordering RNP assembly 

RNA tertiary conformational changes that are induced by succes- 
sive protein-binding events are thought to help direct the order of 
assembly of complex RNP machines, including the 30S ribosome”, 
the signal recognition particle’! and telomerase”. For example, the 
binding of ribosomal protein S15 to 16S rRNA initiates the ordered 
assembly of the central domain in the 30S ribosomal subunit”, and 
leads to a change in the orientation of helical domains that favours the 
binding of ribosomal proteins S6 and S18 (ref. 74) (Fig. 4b). Prema- 
ture binding of S6 and S18 to the unbound 16S rRNA may be disfa- 
voured, in part, because of the entropic penalty that is associated with 
the partial freezing-out of interhelical motions. Even in the simpler 
telomerase RNP (consisting of one RNA and two protein compo- 
nents), the binding of the first protein p65 induces a conformational 
change in the RNA that facilitates the binding of telomerase reverse 
transcriptase, thereby ordering assembly”. 

Assembly can also involve coupled protein binding that induces 
changes in both RNA secondary and tertiary structure. For example, 
coupled binding of the maturase and Mrs1 protein cofactors to the 
RNA of the bI3 group-I intron RNP stabilizes both the native tertiary 
contacts and induces a reorganization of a non-native intermediate 
secondary structure”. Although both Mrs1 dimers and maturase 
can independently bind and stabilize portions of the bI3 tertiary 
structure, binding by both proteins is required to induce the second- 
ary structure rearrangement and assembly to the native, active state. 


Ribozyme catalysis 

Tertiary conformational transitions involving large changes in the 
orientation of helical arms are often observed in small ribozymes, 
such as the hairpin and HDV, and are thought to be important for 
the transition between the different steps of the catalytic cycles. Typi- 
cally, an undocked (inactive) conformation binds the substrate, pro- 
moting the transition into a docked (active) conformation, which is 
required for catalysis. After catalysis, another undocking transition 
allows the release of the product (Fig. 4c). The importance of these 
motions is demonstrated by the fact that the junction motions can 
accelerate the rate of folding of the active conformation”*. Similarly, 
large hinge-like motions of the J2a/b bulge in human telomerase have 
been proposed to help with dynamic telomere repeat synthesis”. A 
more exceptional example is the Tetrahymena group-I ribozyme that 
has been shown to interconvert between alternative tertiary con- 
formations, which have a range of substrate binding affinities but 
similar enzymatic activities’”®. The rates of interconversion between 
these states are slower than the rate of catalysis, implying the exist- 
ence of multiple native states. Such long-lived heterogeneities have 
been observed in the tertiary folds of many other RNAs, although 
some of these may be the result of RNA purification side-products”. 
The atomic level structural differences between these species and the 
source of the severe heterogeneity are still unknown but they may 
constitute yet another mechanism used by RNA to define a narrow 
set of differentiated conformations and this should be an exciting 
topic for future research. 


Protein synthesis 

Perhaps the best example of the cell manipulating the intrinsic 
dynamic landscape of RNA to achieve a desired biological outcome 
is ribosome catalysis. Large-scale ratcheting motions are required 
for translation. The small and large subunits reorient with respect 
to one another through numerous structural intermediates that are 
driven by changes in the conformation of both the ribosomal RNAs 
and proteins*’™ (Fig. 4d). Data strongly indicate that all of these 
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intermediates are relatively low-lying energy states that readily inter- 
convert, which has been highlighted by the ability of the ribosome 
to spontaneously undergo full tRNA retrotranslocation*™® (Fig. 4e). 
This has led to a ‘Brownian machine’ model of the ribosome, where 
the ribosome’s functionality is derived in part by its ability to harness 
thermally driven equilibrium fluctuations and bias them to promote 
the translation process” (Fig. 4f). 

The cell combines these intrinsic ribosome dynamics with numer- 
ous effectors to achieve tight control over the complex transactions 
that are required by translation. One such transaction is the selection 
and proofreading of incoming tRNAs that are responsible for the 
ribosome’s remarkable ability to consistently discriminate between 
cognate and near- or non-cognate tRNAs, in which small differ- 
ences between the minihelices of incorrect and correct anticodon- 
codon pairs will lead to tRNA accommodation or rejection. Here, 
the formation of a cognate minihelix results in a kinked tRNA struc- 
ture and triggers a 30S ‘domain closure’ motion*®*™. This stabilizes 
tRNA-ribosome interactions and in turn promotes conformational 
rearrangements in the EF-Tu protein of the EF-TueGTP-tRNA ter- 
nary complex that delivered tRNA to the ribosome; this results in 
EF-TueGTP hydrolysis, release of (RNA from EF-Tu and initial tRNA 
selection®”””. The second proofreading step that follows EF-Tu dis- 
sociation is thought to be driven by relaxation of the kinked tRNA. 
In cognate tRNAs, the strong interactions between the codon and the 
anticodon cause a bias of tRNA relaxation towards a conformation 
that is fully accommodated within the A-site. However, for near- 
cognate tRNAs, which have weak codon-anticodon interactions, the 
relaxation of the kinked tRNA can occur through other pathways 
that lead to rejection”””’”. Following tRNA accommodation, other 
factors, including EF-G™, other initiation factors”, recycling fac- 
tors’, release factors”, and even the identity and acylation state of 
the tRNA occupying the neighbouring ribosomal P-site”, act on the 
translation process, manipulating the ribosome’s dynamic landscape 
to drive efficient synthesis of the mRNA-encoded protein. 

Owing to the overwhelming complexity of the ribosome, the mech- 
anisms and atomic level details of the many conformational transi- 
tions involved in the translation process remain unclear. Among these 
unresolved questions are how the ribosome’s RNA and protein com- 
ponents cooperate to confer dynamic specificity and robustness on 
ribosome dynamics”. Research into this process is another exciting 
area of future study, and we can confidently predict that this will be 
yet another biological system shown to rely heavily on the virtuosity 
of RNA dynamics. 


Outlook 

The conventional view that one sequence codes for one structure 
and one function is being replaced by a dynamic view of RNA as 
a pre-existing superposition of conformational states that can be 
resolved into a directed and synchronized motion by dedicated cel- 
lular machinery, leading to a broad range of functional outcomes. 
This makes it all the more important to study RNA dynamics within 
the complex in vivo environment of living cells, an important goal 
for the future. We also need to increase our basic understanding of 
RNA dynamic behaviour, even within the simpler in vitro environ- 
ment. It is remarkable that, even for well-studied molecules such as 
tRNA, there is very little experimental data available regarding the 
equilibrium fluctuations in tRNA at the atomic level; the same is also 
true for catalytically important motions in ribozymes. Similarly, little 
is known about the structure and dynamics of large RNAs, such as 
eukaryotic mRNAs. This will require the combined development of 
computational and experimental tools to move towards developing 
atomic-level movies of RNA in dynamic action within living cells as 
well as a better predictive understanding of RNA dynamic behaviour. 
In the meantime, great advances can be made by simply embracing 
this new dynamic view of RNA and always being on the lookout for 
another myoglobin. = 
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RNA-guided genetic silencing 
systems in bacteria and archaea 


Blake Wiedenheft'*+ , Samuel H. Sternberg’ & Jennifer A. Doudna’ * 


Clustered regularly interspaced short palindromic repeat (CRISPR) are essential components of nucleic-acid-based 
adaptive immune systems that are widespread in bacteria and archaea. Similar to RNA interference (RNAi) pathways 
in eukaryotes, CRISPR-mediated immune systems rely on small RNAs for sequence-specific detection and silencing 
of foreign nucleic acids, including viruses and plasmids. However, the mechanism of RNA-based bacterial immunity is 
distinct from RNAi. Understanding how small RNAs are used to find and destroy foreign nucleic acids will provide new 
insights into the diverse mechanisms of RNA-controlled genetic silencing systems. 


on the planet, thriving in habitats that range from hot springs to 

humans. However, viruses outnumber their microbial hosts in 
every ecological setting, and the selective pressures imposed by these 
rapidly evolving parasites has driven the diversification of microbial 
defence systems’ ’. Historically, our understanding of antiviral immu- 
nity in bacteria has focused on restriction-modification systems, 
abortive-phage phenotypes, toxin—antitoxins and other innate defence 
systems*”. More recently, bioinformatic, genetic and biochemical stud- 
ies have revealed that many prokaryotes use an RNA-based adaptive 
immune system to target and destroy genetic parasites (reviewed in refs 
6-12). Such adaptive immunity, previously thought to occur only in 
eukaryotes, provides an example of RNA-guided destruction of foreign 
genetic material by a process that is distinct from RNA interference 
(RNAi) (Fig. 1). 

In response to viral and plasmid challenges, bacteria and archaea 
integrate short fragments of foreign nucleic acid into the host chromo- 
some at one end of a repetitive element known as CRISPR (clustered 
regularly interspaced short palindromic repeat)'* '*. These repetitive 
loci serve as molecular ‘vaccination cards’ by maintaining a genetic 
record of prior encounters with foreign transgressors. CRISPR loci 
are transcribed, and the long primary transcript is processed into a 
library of short CRISPR-derived RNAs (crRNAs)'*”' that each con- 
tain a sequence complementary to a previously encountered invading 
nucleic acid. Each crRNAs is packaged into a large surveillance complex 
that patrols the intracellular environment and mediates the detection 
and destruction of foreign nucleic acid targets", 

CRISPRs were originally identified in the Escherichia coli genome 
in 1987, when they were described as an unusual sequence element 
consisting of a series of 29-nucleotide repeats separated by unique 
32-nucleotide ‘spacer’ sequences”. Repetitive sequences with a similar 
repeat—spacer-repeat pattern were later identified in phylogenetically 
diverse bacterial and archaeal genomes, but the function of these repeats 
remained obscure until many spacer sequences were recoginized as 
being identical to viral and plasmid sequences”*’. This observation 
led to the hypothesis that CRISPRs provide a genetic memory of infec- 
tion”’, and the detection of short CRISPR-derived RNA transcripts 
suggested that there may be functional similarities between CRISPR- 
based immunity and RNAi”. In this Insight, we review three stages of 
CRISPR-based adaptive immunity and compare mechanistic aspects of 
these immune systems to other RNA-guided genetic silencing pathways. 


B acteria and archaea are the most diverse and abundant organisms 


Architecture and composition of CRISPR loci 

The defining feature of CRISPR loci is a series of direct repeats 
(approximately 20-50 base pairs) separated by unique spacer 
sequences of a similar length’’**** (Fig. 2). The repeat sequences 
within a CRISPR locus are conserved, but repeats in different CRISPR 
loci can vary in both sequence and length. In addition, the number 
of repeat-spacer units in a CRISPR locus varies widely within and 
among organisms”. 

The sequence diversity of these repetitive loci initially limited their 
detection and obscured their relationship, but computational methods 
have been developed for detecting repeat patterns rather than related 
sequences*******, One of the first-generation pattern-recognition algo- 
rithms identified the repeat-spacer-repeat architecture in phylogeneti- 
cally diverse bacterial and archaeal genomes, but related structures were 
not identified in eukaryotic chromosomes”. Comparative analyses of 
the sequences adjacent to the CRISPR loci have revealed an (A+T)-rich 
‘leader’ sequence that has been shown to serve as a promoter element 
for CRISPR transcription” ”. In addition to the leader sequence, Jansen 
et al.” identified a set of four CRISPR-associated (cas) genes known as 
cas1-4 that are found exclusively in genomes containing CRISPRs. Based 
on sequence similarity to proteins of known function, Cas3 was predicted 
to bea helicase and Cas4 a RecB-like exonuclease”. 

Subsequent bioinformatic analyses have shown that CRISPR loci are 
flanked by a large number of extremely diverse cas genes”. The cas! 
gene is acommon component of all CRISPR systems, and phylogenetic 
analyses of Cas1 sequences indicate there are several versions of the 
CRISPR system. Providing additional evidence for the classification 
of distinct CRISPR types, neighbourhood analysis has identified con- 
served arrangements of between four and ten cas genes that are found 
in association with CRISPR loci harbouring specific repeat sequences”. 

These distinct immune systems have been divided into three major 
CRISPR types on the basis of gene conservation and locus organiza- 
tion'®. More than one CRISPR type is often found in a single organism, 
indicating that these systems are probably mutually compatible and could 
share functional components’’. Despite the variation in number and 
diversity of cas genes, the distinguishing feature of all type I systems is that 
they encode a cas3 gene. The Cas3 protein contains an N-terminal HD 
phosphohydrolase domain and a C-terminal helicase domain”. In 
some type I systems, the Cas3 nuclease and helicase domains are encoded 
by separate genes (cas3” and cas3’, respectively), but in each case they are 
thought to participate in degrading foreign nucleic acids”* (Fig. 2). 
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Type II CRISPR systems consist of just four cas genes, one of which 
is always cas9 (formerly referred to as csn1). Cas9 is a large protein 
that includes both a RuvC-like nuclease domain and an HNH nucle- 
ase domain. Studies in Streptococcus pyogenes and Streptococcus ther- 
mophilus have indicated that Cas9 may participate in both CRISPR RNA 
processing and target destruction'*’*"’. Two variations of the type III 
system have been identified (known as III-A and III-B). This division 
is supported by the functional differences reported in Staphylococcus 
epidermidis and Pyrococcus furiosus”’**. The immune system in S. epi- 
dermidis (type III-A) targets plasmid DNA in vivo, whereas the purified 
components of the type III-B system in P._furiosus have been found to 
cleave only single-stranded RNA substrates in vitro. The functional dis- 
tinction between these two closely related systems suggests there could 
be other mechanistic differences between the distinct CRISPR subtypes. 


Integration of new information into CRISPR loci 

Acquisition of foreign DNA is the first step of CRISPR-mediated immu- 
nity (Fig. 2 and 3). During this stage, a short segment of DNA from an 
invading virus or plasmid (known as the protospacer) is integrated pref- 
erentially at the leader end of the CRISPR locus'*””. Although metagen- 
omic studies performed on environmental samples indicate that 
CRISPRs evolve rapidly in dynamic equilibrium with resident phage 
populations’*””°, the type II system in S. thermophilus is currently the 
only CRISPR system that has been shown to robustly acquire new phage 
or plasmid sequences in a pure culture. Phage-challenge experiments in 
S. thermophilus have indicated that a small proportion of the cells ina 
population will typically incorporate a single virus-derived sequence at 
the leader end ofa CRISPR locus’*"**"’. The CRISPR-repeat sequence 
is duplicated for each new spacer seqenced added, thus maintaining the 
repeat-spacer-repeat architecture. Although the mechanism of spacer 
integration and replication of the repeat sequence is still unknown, 
studies in S. thermophilus and E. coli have indicated that several Cas 
proteins are involved in the process'*"**”**, Mutational analysis of the 
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Figure 1 | Parallels and distinctions between CRISPR RNA-guided 
silencing systems and RNAi. CRISPR systems and RNAi recognize 

long RNA precursors that are processed into small RNAs, which act as 
sequence-specific guides for targeting complementary nucleic acids. In 
CRISPR systems, foreign DNA is integrated into the CRISPR locus, and long 
transcripts from these loci are processed by a CRISPR-associated (Cas) or 
RNase III family nuclease’**"™. The short CRISPR-derived RNAs (crRNAs) 
assemble with Cas proteins into large surveillance complexes that target 
destruction of invading genetic material’*””*’”*. In some eukaryotes, long 
double-stranded RNAs are recognized as foreign, and a specialized RNase III 
family endoribonuclease (Dicer) cleaves these RNAs into short-interfering 
RNAs (siRNAs) that guide the immune system to invading RNA viruses”. 
PIW]-interacting RNAs (piRNAs) are transcribed from repetitive clusters in 
the genome that often contain many copies of retrotransposons and primarily 
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cas genes in S. thermophilus demonstrated that csn2 (previously known 
as cas7) is required for new spacer sequence acquisition’. This gene is 
not conserved in other CRISPR types, which suggests that either the 
mechanism of adaptation in S. thermophilus is distinct from the other 
types or that there are functional orthologues of Csn2 in other systems. 
Furthermore, gene deletion experiments in both S. thermophilus and 
E. coli have shown that neither cas1 nor cas2 genes are required for 
CRISPR RNA processing or targeted interference”**™*. These genetic 
studies suggest a role for Cas1 and Cas2 in the integration of foreign 
DNA into the CRISPR. 

The role of Cas] in CRISPR-mediated immunity is still uncertain; 
however, biochemical and structural data indicate a function for Cas1 in 
new-spacer-sequence acquisition °°. Cas1 proteins from Pseudomo- 
nas aeruginosa”, E. coli** and Sulfolobus solfataricus” have been purified 
and studied biochemically. The Cas1 protein from S. solfataricus has 
been shown to bind nucleic acids with high affinity (K, ranging from 20 
to 50 nM), but without sequence preference”. The Cas1 protein from 
E. coli also binds to DNA with a preference for mismatched or abasic 
substrates”. This observation is consistent with a recent study show- 
ing a physical and genetic interaction between E. coli Cas] and several 
proteins associated with DNA replication and repair™. 

Activity assays with Cas1 from P aeruginosa and E. coli indicate that 
Cas] is a metal-dependent nuclease. The Cas1 protein from P. aeruginosa 
is a DNA-specific nuclease, whereas the Cas1 protein from E. coli hada 
nuclease activity on a wider range of nucleic acid substrates™*”*. These 
in vitro assays suggest that Cas1 proteins interact with nucleic acids in a 
non-sequence-specific manner. 

Crystal structures for five different Cas1 proteins are currently avail- 
able (Protein Data Bank (PDB) identifiers: 3GOD, 3NKD, 3LFX, 3PV9 
and 2YZS)*”*. Although the amino acid sequences for these proteins are 
extremely diverse (less than 15% sequence identity), their tertiary and 
quaternary structures are similar. All Cas1 proteins seem to share a two- 
domain architecture consisting of an N-terminal B-strand domain anda 
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act by restricting transposon mobility”. The biogenesis of piRNAs is 

not yet fully understood. MicroRNAs (miRNAs) are also encoded on the 
chromosome, and primary miRNA transcripts form stable hairpin structures 
that are sequentially processed (shown by red triangles) by two RNase III 
family endoribonucleases (Drosha and Dicer)”. miRNAs do not participate 
in genome defence but are major regulators of endogenous gene expression”. 
Like crRNAs, eukaryotic piRNAs, siRNAs and miRNAs associate with 
proteins that facilitate complementary interactions with invading nucleic 
acid targets”. In eukaryotes, the Argonaute proteins pre-order the 5’ 
region of the guide RNA into a helical configuration, reducing the entropy 
penalty of interactions with target RNAs”. This high-affinity binding site, 
called the ‘seed’ sequence, is essential for target sequence interactions. Recent 
studies indicate that the CRISPR system may use a similar seed-binding 


mechanism for enhancing target sequence interactions**””*>™, 
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C-terminal a-helical domain (Fig. 3). The C-terminal domain contains a 
conserved divalent metal-ion binding site, and alanine substitutions of the 
metal-coordinating residues inhibit Cas1-catalysed DNA degradation™”. 
The metal ion is surrounded by a cluster of basic residues that form a 
strip of positive charge across the surface of the C-terminal domain. This 
positively charged surface may serve as an electrostatic snare to position 
nucleic-acid substrates near the catalytic metal ions” (Fig. 3). The Cas1 
protein forms a stable homodimer that is formed through interactions 
between the two $-strand domains, which are related by a pseudo-two- 
fold axis of symmetry™”*. This organization creates a saddle-like structure 
that can be modelled onto double-stranded DNA without steric clashing. 
6-hairpins, one from each of the two symmetrically related molecules, 
hang on opposite faces of the double-stranded DNA (like stirrups on a 
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saddle). Although this feature of the Cas] structure did not initially stand 
out as a potential DNA-binding site, comparative analysis of the avail- 
able Cas1 structures reveals a conserved set of positively charged residues 
along each of the B-hairpins that could contact the phosphate backbone. 
The two B-hairpins, which are symmetrically related, might participate 
in sequence-specific interactions with the CRISPR repeat, whereas the 
large positively charged surface on the C-terminal a-helical domain could 
account for the high-affinity, non-sequence-specific interactions that have 
been observed in vitro. 

In spite of these structural studies and biochemical results, it is still only 
possible to speculate on the role of Cas1 in the integration of new spacer 
sequences, and many steps associated with the integration process still 
need to be explained. For example, new spacer sequences are inserted 
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Figure 2 | Diversity of CRISPR-mediated adaptive immune systems 

in bacteria and archaea. A diverse set of CRISPR-associated (cas) 

genes (grey arrows) encode proteins required for new spacer sequence 
acquisition (Stage 1), CRISPR RNA biogenesis (Stage 2) and target 
interference (Stage 3). Each CRISPR locus consists of a series of direct 
repeats separated by unique spacer sequences acquired from invading 
genetic elements (protospacers). Protospacers are flanked by a short 
motif called the protospacer adjacent motif (PAM, **) that is located on 
the 5’ (type I) or 3’ (type II) side in foreign DNA'”**”’, Long CRISPR 
transcripts are processed into short crRNAs by distinct mechanisms. In 
type I and III systems, a CRISPR-specific endoribonuclease (yellow ovals 
and green circles, respectively) cleaves 8 nucleotides upstream of each 
spacer sequence’*'**"* In type III systems, the repeat sequence on the 3’ 


end of the crRNA is trimmed by an unknown mechanism (green pacman, 
right). In type II systems, a trans-acting antisense RNA (tracrRNA) with 
complementarity to the CRISPR RNA repeat sequence forms an RNA 
duplex that is recognized and cleaved by cellular RNase III (brown ovals)’”. 
This cleavage intermediate is further processed at the 5’ end resulting in 

a mature, approximately 40-nucleotide crRNA with an approximately 
20-nucleotide 3’-handle. In each system, the mature crRNA associates with 
one or more Cas proteins to form a surveillance complex (green rectangles). 
Type I systems encode a Cas3 nuclease (blue pacman), which may be 
recruited to the surveillance complex following target binding”. A short 
high-affinity binding site called a seed-sequence has been identified in some 
type I systems”, and genetic experiments suggest that type II systems have 
a seed sequence located at the 3’ end of the crRNA spacer sequence”. 
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preferentially at the leader end of the CRISPR, but the mechanism of 
leader end recognition is unknown. One simple model suggests that the 
leader sequence contains a recognition element that recruits the integra- 
tion machinery. It is equally possible that integration relies on single- 
stranded regions of the CRISPR DNA that are made available during 
transcription. Transcription-associated recombination is involved in 
genome stability®, and a mechanism that couples integration together 
with transcription would link the process of adaptation to CRISPR RNA 
expression, ensuring that spacers from the most recent virus or plasmid 
are transcribed first. 

The integration machinery must be able to distinguish foreign DNA 
from that of the host genome. The molecular cues that are involved in the 
distinction of ‘self’ from ‘non-self’ are still unknown, but sequencing of 
CRISPR loci following phage challenge suggests that spacer sequences 
are not selected at random*””*!">*?, Mapping spacer sequences onto 
viral genomes reveals a short sequence motif proximal to the protospacer, 
which is referred to as the protospacer adjacent motif (PAM). PAM 
sequences are only a few nucleotides long, and the precise sequence var- 
ies depending on the CRISPR system type”. This variation suggests that 
one or more of the Cas proteins associated with each immune system is 
involved in PAM recognition, but the mechanism governing this specific- 
ity is unknown. 


CRISPR RNA biogenesis 

Spacer acquisition is the first step of immunization, but successful protec- 
tion from bacteriophage or plasmid challenge requires the CRISPR to be 
transcribed and processed into short CRISPR-derived RNAs (crRNAs). 
crRNAs were first detected by small RNA profiling in Archaeoglobus 
fulgidus*' and S. solfataricus”. Northern-blot analysis using probes 
against the repeat sequence of the CRISPR revealed a ‘ladder-like’ pat- 
tern of RNA consistent with a long precursor CRISPR RNA transcript 
(pre-crRNA) that was processed at approximately 60-nucleotide inter- 
vals. In fact, the 3’ ends of cloned crRNAs were mapped to the middle 
of the CRISPR repeat®’, which suggested that the repeat sequence was 
recognized and cleaved. 

The need for crRNAs in CRISPR-mediated defence was demon- 
strated initially by investigation of a CRISPR-specific endoribonucle- 
ase in E. coli called Cas6e (formerly known as Cse3 or CasE)”. Cas6e 
specifically binds and cleaves within each repeat sequence of the long 
pre-crRNA, resulting in a library of crRNAs that each contain a unique 
spacer sequence flanked by fragments of the adjacent repeats. Mutation 
of a conserved histidine blocks crRNA biogenesis and leaves the cell 
susceptible to phage infection”. 

The Cas6e protein consists of a double ferredoxin-like fold that selec- 
tively associates with specific RNA repeats and does not associate with 
DNA or CRISPR RNAs containing a non-cognate repeat sequence 
18202289 (Fig, 4). Crystal structures of Cas6e bound to a CRISPR RNA 
repeat reveal a combination of sequence- and structure-specific interac- 
tions that explain the molecular mechanism of substrate recognition’*”*. 
The repeat sequence of the E. coli CRISPR is partially palindromic, and 
the RNA forms a stable (approximately 20-nucleotide) stem loop””*. A 
positively charged B-hairpin in Cas6e interacts with the major groove of 
the RNA duplex, which positions the 3’ strand of the crRNA stem along a 
conserved, positively charged cleft on one face of the protein'*” (Fig. 4). 
RNA binding induces a conformational change that disrupts the bot- 
tom base pair of the stem and positions the scissile phosphate within 
the enzyme active site for site-specific cleavage”. CRISPR RNA cleav- 
age occurs 8 nucleotides upstream of the spacer sequence, which results 
in 61-nucleotide mature crRNAs consisting of a 32-nucleotide spacer 
flanked by 8 nucleotides of the repeat sequence on the 5’ end (known 
as the 5’-handle) and 21 nucleotides of the remaining repeat sequence 
on the 3’ end (Fig. 4). Cas6e remains tightly bound to the 3’ stem-loop” 
and may serve as a nucleation point for assembly ofa large effector com- 
plex, Cascade (CRISPR-associated complex for antiviral defence), that is 
required for phage silencing in the next stage of the immune system*””*”° 
(discussed later). 
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Crystal structures of CRISPR-specific endoribonucleases from two 
other immune systems offer additional insights into the co-evolution- 
ary relationship between these specialized enzymes and their cognate 
RNAs’***! (Fig. 4). In B. aeruginosa, Cas6f (previously known as Csy4) 
specifically binds and cleaves the CRISPR-RNA-repeat 8 nucleotides 
upstream of the spacer sequence, which leaves a similar 8-nucleotide 
5’-handle on mature crRNAs””. The co-crystal structure of Cas6f bound 
to its cognate RNA reveals interesting parallels between the method of 
RNA binding used by Cas6f and Cas6e’**. Like Cas6e, the P. aeruginosa 
Cas6f protein recognizes the sequence and shape of a stable stem-loop 
in the crRNA repeat sequence by interacting extensively with the major 
groove of the double-stranded RNA. However, the structural elements 
responsible for this interaction are distinct between the two proteins'*”° 
(Fig. 4). The Cas6f protein has a two-domain architecture, which con- 
sists of an N-terminal ferredoxin-like fold similar to that in Cas6e, but 
its C-terminal domain is structurally distinct. An arginine-rich helix 
in the C-terminal domain of Casé6f inserts into the major groove of the 
crRNA duplex, and the bottom of the crRNA is positioned for sequence- 
specific hydrogen-bonding contacts in the RNA major groove. These 
contacts position the scissile phosphate of the crRNA in the enzyme 
active site so that cleavage occurs 8 nucleotides upstream of the spacer 
sequence” (Fig. 4). 

Although Cas6f and Cas6e recognize the sequence and shape of the 
crRNA hairpin in their respective systems, CRISPR RNA repeats in 
other CRISPR systems are thought to be unstructured®. For example, 
the Cas6 protein from P furiosus associates with CRISPR transcripts that 
are expected to contain unstructured repeats™. The specific recognition 
of an unstructured RNA repeat requires a distinct mechanistic solution 
for RNA substrate discrimination. Remarkably, crystallographic studies 
of the Cas6 protein from P furiosus have revealed the same duplicated 
ferredoxin-like fold observed in the Cas6e protein, but with a different 
mode of RNA recognition involving the opposite face of the protein 
(Fig. 4). In Cas6, the two ferredoxin-like folds clamp the 5’ end of the 
single-stranded RNA repeat sequence in place’'. Although the RNA in 
this structure is disordered in the enzyme active site, biochemical studies 
have shown that cleavage occurs 8 nucleotides upstream of the spacer 
sequence'*™. While the nucleotide sequences at the cleavage site vary for 
each of the different Cas6 proteins, all Cas6 family endoribonucleases 
cleave their cognate RNA 8 nucleotides upstream of the spacer sequence 
using a metal-ion-independent mechanism. 

Despite advances in our understanding of crRNA biogenesis, the 
diversity of cas genes has obscured identification of the protein fac- 
tors responsible for CRISPR RNA processing in some systems. Type II 
immune systems consist of four cas genes, none of which have a detect- 
able sequence similarity to known CRISPR-specific endoribonucleases. 
Recently, a different CRISPR RNA processing mechanism has been 
reported that involves RNase-III-mediated cleavage of double-stranded 
regions of the CRISPR RNA repeats’”. The first indication of this mecha- 
nism came from deep sequencing of RNA from S. pyogenes. An abundant 
transcript containing a 25-nucleotide sequence that was complemen- 
tary to the CRISPR repeat was identified. This RNA, termed tracrRNA 
(trans-activating CRISPR RNA), is coded on the opposite strand and just 
upstream of the CRISPR locus. Genetic and biochemical experiments 
demonstrated that tracrRNA and pre-crRNA are co-processed by RNase 
III, which produces cleavage products with a 2 nucleotide 3’ overhang”. 
In vivo processing of CRISPR RNAs required Cas9 (previously known as 
Csn1), although a precise role for this enzyme in RNA processing has not 
yet been defined. The essential role of cellular proteins that are not solely 
involved in CRISPR-mediated defence, such as RNase III, indicates that 
different host factors may be involved as ancillary components of these 
immune systems. 


crRNA-guided interference 

The third stage of CRISPR-mediated immunity is target interference 
(Fig. 2). Here crRNAs associate with Cas proteins to form large CRISPR- 
associated ribonucleoprotein complexes that can recognize invading 
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Figure 3 | Steps leading to new spacer integration. a, The Cas1 protein 
forms a stable homodimer where the two molecules (green and grey) are 
related by a pseudo-two-fold axis of symmetry (PBD ID: 3GOD)™*. This 
organization creates a saddle-like structure in the N-terminal domain, 
in which B-hairpins (blue) from each symmetrically related molecule 
hang (like stirrups) that are separated by approximately 20 A, and may 
interact with the phosphodiester backbone of double-stranded DNA. An 
electrostatic surface representation (bottom) reveals a cluster of basic 
residues (blue) that form a positively charged strip across the metal- 
binding surface of the C-terminal domain. This strip may serve as an 
electrostatic trap that positions DNA substrates proximally to catalytic 


nucleic acids. Foreign nucleic acids are identified by base-pairing interac- 
tions between the crRNA spacer sequence and a complementary sequence 
from the intruder. Phage- and plasmid-challenge experiments performed 
in several model systems have demonstrated that crRNAs complementary 
to either the coding or the non-coding strand of the invading DNA can 
provide immunity'*’”*”°, This is indicative of an RNA-guided DNA- 
targeting system, and indeed a pathway for DNA silencing has recently 
been demonstrated in S. thermophilus’. DNA sequencing and Southern 
blots indicated that both strands of the target DNA are cleaved within 
the region that is complementary to the crRNA spacer sequence”. This 
mechanism efficiently eliminates foreign DNA sequences, which have 
been specified by the spacer region of the crRNA, but avoids targeting 
the complementary DNA sequences in the CRISPR region of the host 
chromosome. The mechanism for distinguishing self from non-self is 
built into the crRNA. The spacer sequence of each crRNA is flanked by 
a portion of the adjacent CRISPR repeat sequence, and any complemen- 
tarity beyond the spacer into the adjacent repeat region signals self and 
prevents the destruction of the host chromosome”. 

However, not all CRISPR systems target DNA. In vitro experiments 
using enzymes from the type III-B CRISPR system of P. furiosus have 
shown that this system cleaves target RNA rather than DNA“. All DNA 
targeting systems encode a complementary DNA sequence for each 
crRNA in the CRISPR locus and therefore require a mechanism for distin- 
guishing self (CRISPR locus) from non-self (invading DNA). In contrast, 
systems that target RNA may not be required to make this distinction 
because most CRISPR loci are transcribed only in one direction and thus 
do not generate complementary RNA targets. CRISPR systems that tar- 
get RNA may be uniquely capable of defending against viruses that have 
RNA-based genomes. However, adaptation of the CRISPR in response 
to a challenge by an RNA-based virus will probably require the invading 
RNA to be reverse-transcribed into DNA before it can be integrated into 
the CRISPR locus. 
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metal ions (green sphere). b, CRISPR adaptation occurs by integrating 
fragments of foreign nucleic acid preferentially at the leader end of the 
CRISPR, forming new repeat-spacer units in the process. Protospacers 
are chosen non-randomly and may be selected from regions flanking the 
protospacer adjacent motif (PAM). Coordinated cleavage of the foreign 
DNA and integration of the protospacer into the leader-end of the CRISPR 
occurs through a mechanism that duplicates the repeat sequence and 
thus preserves the repeat-spacer-repeat architecture of the CRISPR locus. 
Although the protein components required for this process have not been 
conclusively identified, Cas1 and other general recombination or repair 
factors have been implicated (blue ovals)*”**"*. 


Cas proteins directly participate in target binding. Recent bio- 
chemical studies have shown that CRISPR-associated complexes 
facilitate target recognition by enhancing sequence-specific 
hybridization between the CRISPR RNA and complementary target 
sequences”. A short high-affinity binding site located at one end of 
the crRNA spacer sequence governs the efficiency of target binding, 
and viruses that acquired a single mismatch in this region were able 
to escape detection by the immune system”. This high-affinity bind- 
ing site is functionally analogous to the ‘seed’ sequence (Fig. 1) that 
has been identified in eukaryotic microRNAs (miRNAs)”. Struc- 
tural and biochemical studies have shown that Argonaute proteins 
facilitate target recognition by pre-ordering the nucleotides at the 
5’ end of the miRNA in a helical configuration”. This pre-ordering 
reduces the entropic penalty that is associated with helix forma- 
tion and provides a thermodynamic advantage for target binding 
within this region. A similar mechanism may occur during crRNA 
target binding, providing an interesting example of convergent evo- 
lution between CRISPR-based immunity in prokaryotes and RNAi 
in eukaryotes (Fig. 1). 

Structural and biochemical studies have been performed on 
CRISPR-associated complexes isolated from three different type I 
CRISPR systems~™ *”’. These complexes seem to share some gen- 
eral morphological features, but the precise special arrangement of 
the Cas proteins and their interactions with the crRNA have been 
unclear. Sub-nanometre-resolution structures of the CRISPR-asso- 
ciated complex from E. coli (Cascade) have recently been determined 
using cryo-electron microscopy’®. This complex is comprised of an 
unequal stoichiometry of 5 functionally essential Cas proteins and 
a 61-nucleotide crRNA””*”®, The structure reveals a sea-horse- 
shaped architecture in which the crRNA is displayed along a helical 
arrangement of protein subunits that protect the crRNA from deg- 
radation”’. The 5’ and 3’ ends of the crRNA form unique structures 
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Figure 4 | Diverse mechanisms of CRISPR RNA biogenesis. CRISPR 
RNA repeats are specifically recognized and cleaved by diverse 
mechanisms. In type I CRISPR systems, Cas6e (PDB ID: 2Y8W) and Cas6f 
(PDB ID: 2XLK) recognize the major groove of the crRNA stem-loop 
primarily through electrostatic interactions using a B-hairpin and a-helix, 
respectively'*!°”°. Cleavage occurs at the double-stranded-single-stranded 
junction (black arrows), leaving an 8-nt 5’-handle on mature crRNAs. In 
type II CRISPR systems, tracrRNA hybridizes to the pre-crRNA repeat 

to form duplex RNAs that are substrates for endonucleolytic cleavage by 
host RNase III (PDB ID: 2EZ6), an activity that may also require Cas9 

(ref. 17). Subsequent trimming (red arrows) by an unidentified nuclease 


that are anchored at opposite ends of the Cascade complex, dis- 
playing the 32-nucleotide spacer sequence for base-pairing with 
complementary targets. 

The structure of Cascade bound to a 32-nucleotide target sequence” 
reveals a concerted conformational change that could be a signal for 
recruiting Cas3. Cas3 — the trans-acting nuclease of type I CRISPR sys- 
tems — may function as a target ‘slicer’ in a similar way to Argonaute in 
RNAi pathways”“***””, Although Cas3 was implicated previously in the 
process of self versus non-self discrimination, recent studies have dem- 
onstrated that Cascade recognizes the PAM directly and that mutations 
in the PAM decrease Cascade’ affinity for the target®. The importance 
of the PAM is highlighted by the recovery of phage and plasmid escape 
mutants, which frequently contain a single mutation in the PAM’*"""*®. 
The structure of Cascade indicates that the PAM is positioned near the 
‘tail’ of the sea-horse-shaped complex. High-resolution structures and 
mutational analysis of the nucleic acid and protein components in this 
and related systems are needed to determine the mechanisms of target 
authentication and degradation. 


Applications of CRISPR structure and function 

The sequence diversity of CRISPR loci, even within closely related 
strains, has been used for high-resolution genotyping and forensic medi- 
cine. This technique, known as spoligotyping (spacer oligotyping), has 
been applied successfully to the analysis of human pathogens, including 
Mycobacterium tuberculosis”, Corynebacterium diphtheriae” and Salmo- 
nella enterica’’. Spoligotyping was developed long before the function 
of CRISPRs was understood, but now that studies have begun to reveal 
the biological function and mechanism of CRISPR-mediated genetic 
silencing, new opportunities for creative applications have emerged. 
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removes leftover repeat sequences from the 5’ end. Cas6 (PDB ID: 3PKM) 
in type III-B CRISPR systems specifically recognizes single-stranded 
RNA, upstream of the scissile phosphate, on a face of the protein opposite 
that of the previously identified active site residues'**". The remainder of 
the repeat substrate probably wraps around the protein (red dashed line) 
to allow cleavage 8 nucleotides upstream of the repeat-spacer junction. 
Subsequent 3’ trimming (red arrows) generates mature crRNAs of two 
discrete lengths. The N-terminal domain of all Cas 6 family proteins 
adopts a ferredoxin-like fold (light blue). The C-terminal domain of Cas6 
and Casé6e also adopts a ferredoxin-like fold but the C-terminal domain of 
Cas6f is structurally distinct (dark blue). 


Laboratory strains of bacteria are grown in high-density bioreactors for 
many different applications in the food industry, and they are becoming 
increasingly important in the production of biofuels. CRISPR systems 
offer a natural mechanism for adapting economically important bacteria 
for resistance against multiple phages. 

The biochemical activities of various Cas proteins may have use- 
ful applications in molecular biology in much the same way that DNA 
restriction enzymes have revolutionized cloning and DNA manipulation. 
A wide range of CRISPR-specific endoribonucleases that recognize small 
RNA motifs with high affinity expand the number of tools available for 
manipulating nucleic acids. In addition, a cRNA-guided ribonucleopro- 
tein complex in P furiosus was shown to cleave target RNAs”. Site-specific 
cleavage of target RNA molecules could have a range of uses, from gen- 
erating homogeneous termini after in vitro transcription to targeting a 
specific intracellular messenger RNA for inactivation in a similar way to 
RNAi. CRISPRs also provide a new mechanism for limiting the spread of 
antibiotic resistance or the transfer of virulence factors by blocking hori- 
zontal gene transfer’*”’. In addition, CRISPRs participate in a regulatory 
mechanism that alters biofilm formation in P aeruginosa’*”*. Although 
the clinical relevance of CRISPRs remains to be demonstrated, the oppor- 
tunities for creative implementation of this new gene-regulation system 
are perceivably vast. 


Future directions of CRISPR biology 

The discovery of some of the fundamental mechanisms of CRISPR- 
based adaptive immunity has raised new questions and highlighted the 
areas with the greatest potential for future research. Although CRISPR 
RNA processing and targeting steps are now understood in some detail, 
how and when target sequences are identified during a phage infection 
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or plasmid transformation are still unclear. Furthermore, why DNA or 
RNA target sequences are chosen, and their fate once they are bound 
to a crRNA-targeting complex is not well understood. In addition, the 
mechanisms by which foreign sequences are selected and integrated 
into CRISPR loci are almost entirely unknown. Some CRISPR loci seem 
to be considerably more active than others, at least under laboratory 
conditions, so selection of the model organisms will be important. The 
diversity and prevalence of CRISPR systems throughout microbial com- 
munities ensures that new findings and applications in this field will be 
forthcoming in the years ahead. m 
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Modular regulatory principles 
of large non-coding RNAs 


Mitchell Guttman?” & John L. Rinn? 


It is clear that RNA has a diverse set of functions and is more than just a messenger between gene and protein. The 
mammalian genome is extensively transcribed, giving rise to thousands of non-coding transcripts. Whether all of these 
transcripts are functional is debated, but it is evident that there are many functional large non-coding RNAs (ncRNAs). 
Recent studies have begun to explore the functional diversity and mechanistic role of these large ncRNAs. Here we 
synthesize these studies to provide an emerging model whereby large ncRNAs might achieve regulatory specificity through 
modularity, assembling diverse combinations of proteins and possibly RNA and DNA interactions. 


ore than half a century after being placed as the central 
Me errzore in the flow of genetic information from gene to 

protein, it is now accepted that RNA can perform diverse roles. 
Shortly after the discovery of messenger RNA, a large class of heteronu- 
clear RNAs (hnRNAs)' was described, which did not include mRNA or 
associate with polyribosomes’. Following years of sifting through these 
hnRNAs, the first RNA subfamilies were identified. These included small 
nuclear RNAs involved in splicing regulation’ and small nucleolar RNAs 
involved in ribosome biogenesis’, as well as the ribosomal RNAs and 
transfer RNAs involved in translation”®. 

The world of RNA genes became even more complex with the discovery 
of RNAs that resembled mRNA in length and splicing structure but did 
not code for proteins. The first example was H19, which was identified as 
an RNA that was induced during liver development in the mouse’. The 
mouse H19 transcript contained no large open reading frames (ORFs), but 
instead only small sporadic ORFs that were not evolutionarily conserved, 
did not template translation in vivo and did not produce an identifiable 
protein product’, Shortly afterwards, another non-coding RNA (ncRNA), 
termed XIST, was found to be expressed exclusively from the inactive X 
chromosome’ and later demonstrated to be required for X inactivation in 
mamimals'’. Over the next two decades, more large ncRNA genes were dis- 
covered including Airn", Tug] (ref. 12), NRON” and HOTAIR™. With the 
availability ofa draft sequence of the human genome, it became clear that 
much of the mammalian genome is transcribed’*""*. These transcripts were 
mapped to discrete loci throughout the genome. Over the next 10 years, 
both large and small RNA transcripts were discovered at an unprecedented 
rate’>!”~*°. however, the functional significance of most of these transcripts 
was unclear. Although some of these could be considered noise”’”, there 
are still many large ncRNAs that are known to have diverse functions”. 

This Review focuses on the classic examples of large ncRNAs that have 
helped to form the basis of more recent global studies of coding potential, 
function and mechanism. We discuss the concepts that have emerged 
from these examples that provide a framework for understanding the 
principles of RNA interactions. We propose that by assembling distinct 
regulatory components, large ncRNAs could produce intricate functional 
specificity, which is suggestive of a possible modular RNA code. 


RNA maps 

After the sequencing of the human genome, the next major hurdle was 
to define the genes it encoded. To do this, several research groups devel- 
oped tiling microarrays'””””° and complementary DNA sequencing 


methods” to investigate transcriptional activity across the human 
genome, which led to the observation of widespread transcription of 
the genome. These studies, although limited to specific tissues and cell 
types, demonstrated that the mammalian genome encodes many thou- 
sands of non-coding transcripts including both short (<200 nucleotides 
in length) and long (>200 nucleotides in length) transcripts. In this 
Review, we focus on large ncRNAs produced from long transcripts, 
including those that originate from intergenic loci or overlapping pro- 
tein-coding genes. 

Dramatic innovations in sequencing technologies have allowed the 
deep sequencing of cDNAs, known as RNA-Seq”; this deep sequenc- 
ing, coupled with new computational methods for assembling the tran- 
scriptome”’, has identified non-coding transcripts across many different 
cell types and tissues*””. It is now clear that there are thousands of well- 
expressed large ncRNAs with exquisite cell-type and tissue specificity”. 

As the numbers of identified non-coding transcripts increased, so 
did the uncertainty regarding their function; this led some authors to 
express concern that many of these transcripts may be just transcrip- 
tional noise” with no function or incidental by-products of transcrip - 
tion from enhancer regions’. These concerns are supported by the 
observations that many of these transcripts are expressed at extremely 
low levels”** and they have lower levels of evolutionary conservation 
than protein-coding genes**'”’”. Although some of these transcripts 
may indeed be transcriptional noise”, the remaining transcripts con- 
sist of many distinct subclasses, including processed small RNAs'®?938, 
promoter-associated RNAs”, transcripts from enhancer regions**”* 
and functional large ncRNAs!”*; each class varies in its expression and 
conservation properties”'”’”. Distinguishing between these classes of 
RNA transcripts requires additional biological information including 
the coding potential of the RNA and the chromatin modifications of the 
corresponding genomic region (Fig. 1a). 


Chromatin signatures 

Genomic DNA is wrapped around histone proteins and packaged into 
higher-order structures termed chromatin”. These histones can be 
modified in different ways that are indicative of the underlying DNA 
functional state. Advances in sequencing technologies have allowed 
the comprehensive characterization of the chromatin-modification 
landscape of mammalian genomes*'™. These studies revealed com- 
binations of histone modifications (termed chromatin signatures) that 
correspond to various gene properties, including a signature for active 
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transcription”. This signature consists of a short stretch of trimeth- 
ylation of histone protein H3 at the lysine in position 4 (H3K4me3), 
which corresponds to promoter regions, followed by a longer stretch of 
trimethylation of histone H3 at the lysine in position 36 (H3K36me3), 
which covers the entire transcribed region* (Fig. 1a). 

Chromatin maps revealed that, similar to protein-coding genes, 
many ncRNA genes also contain a ‘K4-K36’ signature“. By searching 


: Lo ld 


RNA map 


Chromatin 
signature (K4-K36) 


for K4-K36 domains that do not overlap with known genes, chro- 
matin signatures revealed approximately 1,600 regions in the mouse 
genome and approximately 2,500 regions in the human genome that 
were actively transcribed”. The vast majority of these intergenic K4- 
K36 domains produce multi-exonic RNAs that have little capability to 
encode a conserved protein”**’. RNAs expressed from these K4-K36 
domains were termed large intergenic ncRNAs (lincRNAs) because 
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Figure 1 | Layering of genomic regions. a, Genomic regions are 
colour-coded by the presence of different genomic annotations. RNA 
transcription of a locus (grey), K4-K36 chromatin signature (red), 
K4mel modification and transcriptional activator p300 (green) and 
protein-coding potential (blue). By overlaying this information, distinct 
transcripts are revealed, including ncRNAs (red), protein-coding genes 
(purple) and transcripts from enhancer regions (green). b, A cross-species 
alignment of a coding and a non-coding gene. Boxes represent codons, 
and each row represents a different aligned species. Blue boxes represent 
mutations that cause a synonymous substitution, and red boxes represent 
mutations that cause a non-synonymous substitution. A score capturing 
the coding potential of a sequence across species aligns sequences in all 
frames and scores mutations that maintain coding potential (blue boxes) 
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relative to mutations that break coding potential (that is, non-synonymous 
mutations, stop codons and frameshifting insertions or deletions) (red 
boxes). c, The coding potential score is shown for three gene types, SIRT1 
(a protein-coding gene), XIST (ncRNA gene) and tarsal-less (small- 
peptide coding gene), in which positive scores represent coding regions 
(blue) and negative scores represent non-coding regions (red). In each 
example, the gene structure is shown, where blue boxes represent known 
protein-coding exons and red boxes represent non-coding exons. SIRT 1 
with an ORF length of 576 amino acids (aa) contains a positive score 
over each coding exon but not the non-coding regions. XIST with an 
ORF length of 172 amino acids contains negative scores over the entire 
transcribed region. tarsal-less with an ORF of 11 and 32 amino acids, 
contains positive scores over all known small peptides. 


© 2012 Macmillan Publishers Limited. All rights reserved 


identification by this chromatin signature required the RNAs to be 
contained within the intergenic regions”. Similarly, chromatin-state 
maps revealed that active enhancer regions contained short stretches 
of H3 lysine 4 monomethylation (H3K4me1) (ref. 43) and the tran- 
scriptional coactivator p300 (ref. 42), as well as additional modifica- 
tions** (Fig. 1a). By coupling RNA sequencing and chromatin maps, 
many of the already identified non-coding transcripts were observed 
to be transcribed from active enhancers**’. However, lincRNAs and 
transcripts from enhancer regions are distinct classes, which are 
marked by different chromatin signatures*”™*. Although it needs to 
be determined whether transcripts originating from enhancers have 
a function™”, the functional importance of lincRNAs is becoming 
clearer'*”***76847_ Several of these lincRNAs have been shown to have 
enhancer-like functions as they activate the expression of neighbour- 
ing genes™*”*. 


Coding potential 
Determining whether a transcript is non-coding is challenging because 
along non-coding transcript is likely to contain an ORF purely by 
chance®. Accordingly, the evidence for the absence of coding potential 
for the XIST and H19 genes came from the lack of evolutionary conser- 
vation of the identified ORFs, the lack of homology to known protein 
domains and the inability to template significant protein production®”. 
These principles have been generalized to classify coding potential 
across thousands of transcripts by scoring conserved ORFs across doz- 
ens of species™**’, by searching for homology in large protein-domain 
databases”, and by sequencing RNA associated with polyribosomes”. 

Computational methods such as the ‘codon substitution frequency’ 
algorithm**”' leverage evolutionary information to determine whether 
an ORF is conserved across species and provide a general strategy for 
determining coding potential (Fig. 1b, c). Owing to the large number 
of available genome sequences, these methods have been used to accu- 
rately determine conserved coding potential in regions as small as 5 
amino acids”, which makes them extremely sensitive to the potentially 
small peptides, such as the 11 amino acid peptide encoded by the tar- 
sal-less gene™*”° (Fig. 1c). Despite their sensitivity, conservation-based 
methods may fail to detect newly evolved proteins because they do not 
contain a conserved ORF*””!. However, because many ncRNAs show 
clear evolutionary constraint”*’” but no evolutionarily conserved ORF, 
this indicates that the observed evolutionary selection is not due to a 
newly evolved protein. 

Experimental methods, such as ribosome profiling, have provided 
a strategy for identifying ribosome occupancy on RNA, which have 
been proposed as a method for distinguishing between coding and non- 
coding transcripts” 3 However, this still needs to be tested because non- 
coding transcripts that show an association with the ribosome have not 
been shown to have a protein product**”*. Importantly, an association 
of RNA with a ribosome alone cannot be taken as evidence of protein- 
coding potential because both the ncRNAs of H19 and TUGI can be 
detected in the ribosome**” despite having clear roles as ncRNAs****?. 

An alternative explanation for these observed associations is ‘trans- 
lational noise’, spurious association that may lead to non-functional 
translation products”. Consistent with this, virtually all of the tran- 
scripts that have been suggested to encode small peptides by ribosome 
profiling™ lack the evolutionary conservation of their proposed coding 
regions~”', which is in striking contrast to almost all known protein- 
coding genes”, including the few well-characterized functional small 
peptides***"” (Fig. 1c). Accordingly, identification of any new protein- 
coding gene requires the clear demonstration of the function of the 
protein product in vivo. 


Global identification of ncRNA function 

Identifying the functional role of an ncRNA requires direct perturbation 
experiments, such as loss-of-function and gain-of-function. Individual 
ncRNAs involved in specific processes have been functionally character- 
ized (see ref. 63 for a review). For example, XIST is crucial for random 
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inactivation of the X chromosome; Air is crucial for imprinting con- 
trol at the Igf2rlocus''; HOTAIR affects expression of the HOXD gene 
family", as well as other genes throughout the gnome; HOTTIP 
affects expression of the HOXA gene family”®; lincRNA-RoR affects 
reprogramming efficiency”; NRON affects NFAT transcription factor 
activity’’; and Tug] affects retina development through the regulation of 
the cell cycle’. Although there are now many examples of large ncRNA 
that are required for the correct regulation of gene expression, this is 
just one of many functions in which they are involved; ranging from 
telomere replication to translation”. 

The global characterization of ncRNA function has proved to be 
challenging because, in most cases, it is unclear which phenotype to 
investigate’’. One approach to classifying the putative function of ncR- 
NAs uses ‘guilt-by-association’”’. This approach associates ncRNAs with 
biological processes based on a common expression pattern across cell 
types and tissues (Fig. 2a) and can therefore identify groups of ncR- 
NAs that are associated with specific cellular processes (Fig. 2b). This 
approach has been used to predict roles for hundreds of ncRNAs in 
diverse biological processes such as stem cell pluripotency, immune 
responses, neural processes and cell-cycle regulation””””*. 

Although these correlations cannot prove that ncRNAs have a function 
in these processes, they do provide a hypothesis for targeted loss-of- 
function experiments. For example, lincRNA-p21 was predicted to be 
associated with the p53-mediated DNA damage response”’, and indeed 
lincRNA-p21 was found to be a target of p53 and on perturbation was 
shown to regulate apoptosis in response to DNA damage”. In the same 
way, the ncRNA PANDA (p21 associated ncRNA DNA damage activated) 
was implicated, and was demonstrated to have a function, in the regula- 
tion of apoptosis”. Another ncRNA, lincEncl (ref. 25), was predicted to 
have a role in cell-cycle regulation in embryonic stem (ES) cells and has 
been shown ina separate study to affect the proliferation of ES cells®. 

Alternatively, global approaches can be used to determine function, 
such as systematic RNA interference (RNAi) knockdown followed by 
gene-expression profiling. Unlike correlation analysis, these perturba- 
tion-based experiments provide evidence for the function ofan ncRNA”. 
Methods to classify function using this approach are conceptually simi- 
lar to guilt-by-association because the function can be inferred on the 
basis of the genes that are affected by loss of function of ncRNAs™. A 
systematic perturbation study demonstrated that knockdown of the 
vast majority of lincRNAs expressed in ES cells had a major effect on 
gene expression”. The gene-expression signatures revealed dozens of 
lincRNAs that block key lineage-commitment programs within ES 
cells and function in crucial ES cell regulatory and signalling pathways. 
Importantly, this study also identified 26 lincRNAs that are required to 
maintain the pluripotent state”. 

Not all non-coding transcripts are functional RNA molecules. Several 
examples of intergenic transcription have been identified in which the 
process of transcription alone changes the chromatin- and transcription- 
factor-binding landscape to allow activation and repression of neigh- 
bouring genes®”®. Methods that degrade RNA after its transcription, 
such as RNAi, can distinguish between a functional RNA molecule and 
the process of transcription, on which there should be no observable 
effect after RNA degradation. Collectively, the genome-wide guilt-by- 
association approach and targeted and global perturbation studies have 
demonstrated that large ncRNAs have a crucial regulatory role in diverse 
biological processes??*-777?*”, 


cis- versus trans-regulatory mechanisms 

The discovery that the XIST product was an ncRNA, led immediately 
to the suggestion of a model for how it could function in an allele- 
specific manner’. In theory, an ncRNA has an intrinsic cis-regulatory 
capacity because it can function while remaining tethered to its own 
locus”! (Fig. 2c), whereas an mRNA must be dissociated, exported 
and translated for it to function. Here we define a cis-regulator as 
one that exerts its function on a neighbouring gene on the same allele 
from which it is transcribed, and define a trans-regulator as one that 
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does not meet this criterion. Owing to the unique cis-regulatory 
capability of ncRNAs, it has been speculated that cis-regulation 
could be acommon mechanism for large ncRNAs~™”’. However, 
global functional evidence strongly suggests that this is not the 
case (Box 1). 

To distinguish cis- from trans-regulatory models, initial studies 


have used correlation analysis and identified a significant correla- 
tion of expression between ncRNAs and their neighbouring pro- 
tein-coding genes”’”’. However, several of these cases have been 
demonstrated to be trans-regulatory models, and the apparent cor- 
relations are due to shared upstream regulation (such as, lincRNA- 
p21 (ref. 26) and lincRNA-Sox2 (ref. 25)), positional correlation 
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Figure 2 | Classification of ncRNA function. a, Illustration of an ncRNA 
with expression patterns related to the NF«B pathway. Each row represents 
a gene, and a positive association (red box) is assigned between the ncRNA 
and the pathway based on the correlation of the genes in the process. 
Similarly, the ncRNA is assigned negative association (blue box) with the 
p53 pathway based on anticorrelation with the genes in the process. b, The 
scores for each functional term and ncRNA can be clustered to identify 
classes of ncRNAs. In this example (adapted, with permission, from ref. 25) 
each column represents a different ncRNA, and each row represents a 
different functional term. c, A model of ncRNAs that have a cis-function 
by remaining tethered to their site of transcription. In this model, RNA 
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polymerase (green) transcribes an RNA (red), which can associate with 
regulatory proteins (purple) to affect neighbouring regions, as proposed for 
XIST*”". d, One model for ncRNA trans-regulation. In this model an ncRNA 
can associate with DNA-binding proteins (blue) and regulatory proteins to 
localize and affect the expression of the targets, as proposed for HOTAIR™. 
e, A model for ncRNAs that bind regulatory proteins and change their 
activity, in this case leading to a change in modification state and expression 
of the target gene, as proposed for the CCND1 ncRNAs, which interact with 
the TLS protein”. f, A model for ncRNAs that act as ‘decoys’ In this model, 
ncRNAs bind protein complexes and prevent them from binding to their 
proper regulatory targets, as proposed for GAS5 and PANDA”. 
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BOX1 


Distinguishing cis- from trans-regulation 


neighbouring genes. Known ncRNA 
examples of each of these regulatory 
models are shown to the right of the figure. 
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(such as, HOTAIR”), transcriptional ‘ripple effects’ and indirect 
regulation of neighbouring genes (Box 1). Consistent with these 
explanations, a recent study showed that an increased correlation 
of expression between ncRNAs and their neighbouring genes is 
comparable to that observed for protein-coding genes”. 

Recently, loss-of-function experiments have been used to investi- 
gate cis- versus trans-effects of lincRNAs. One study knocked down 
seven lincRNAs and identified no effects on neighbouring genes 
but did show an effect on other genes**. A second study knocked 
down 12 lincRNAs, 7 of which had modest effects on some of the 
genes within a wide genomic neighbourhood™. More recently, a 
systematic study knocked down approximately 150 lincRNAs and 
identified no effect on the neighbouring genes for about 95% of 
the lincRNAs, which is similar to that observed for protein-coding 
genes”. 

Although perturbation experiments can demonstrate that an 
RNA functions as a trans-regulator, evidence for RNA acting as 
a cis-regulator is more difficult to obtain (Box 1). For example, 
perturbation experiments demonstrated that the ncRNA from JPX 
affects the expression of the neighbouring XIST gene, but as a trans- 
regulator’’. Conclusive proof of cis-regulation requires the demon- 
stration that an RNA regulates a neighbouring gene on the same 
allele (Box 1). So far, few studies have performed this test, and it 
is unclear what percentage of ncRNAs that are suggested to have a 
cis-function by loss-of-function experiments~*”* will pass this test. 
Together, these studies indicate that although some ncRNAs are 
cis-regulators”"”*"®, the vast majority, which have been identified 
and characterized so far, function as trans-regulators'*”? PSST ITT 


Formation of RNA-protein interactions 

The precise mechanism by which ncRNAs function remains poorly 
understood. However, one emerging theme is the interaction between 
ncRNAs and protein complexes. The functional importance of many 
ncRNA-protein interactions for correct transcriptional regulation 
has been demonstrated'*”****, including several ncRNAs that are 
required for the correct localization of chromatin proteins to genomic 
DNA targets”. 


The XIST ncRNA is a key example demonstrating that RNA can playa 
direct role in silencing large genomic regions” by physically interacting 
with the polycomb complex™, leading to the condensation of chromatin 
and transcriptional repression of an entire X chromosome® (Fig. 2c). 
Similar to XIST, many ncRNAs have been identified that physically 
associate with chromatin-regulatory complexes and ‘guide’ the associ- 
ated complexes to specific genomic DNA regions, including HOTAIR™, 
AIR*, KCNQ1otl (ref. 75) and lincRNA-p21 (ref. 26) (Fig. 2d). 

Biochemical evidence has demonstrated that many large ncRNAs 
interact with chromatin regulators”****”**. The precise numbers vary 
depending on the experimental approach*””’, but a conservative esti- 
mate suggests that at least 30% of lincRNAs associate with at least 1 of 
12 distinct chromatin-regulatory complexes, which include readers, 
writers and erasers of chromatin modifications”. 

Importantly, lincRNAs can provide regulatory specificity to these 
complexes because the knockdown of these lincRNAs affects a sub- 
set of the genes that are normally regulated by these complexes”. 
One hypothesis is that ncRNAs provide regulatory specificity by 
localizing chromatin-regulatory complexes to genomic DNA tar- 
gets'*76**5788° Several methods have been developed to generate 
maps of RNA-DNA proximity*””’, but it still needs to be determined 
what percentage of ncRNAs localize to genomic DNA regions and 
how these interactions occur. 

In addition to their role in chromatin regulation, ncRNAs can also 
modulate the regulatory activity of protein complexes (Fig. 2e). As an 
example, an ncRNA upstream of cyclin D1 can bind to the TLS (translo- 
cation in liposarcoma) RNA-binding protein, which changes it from an 
inactive to an active state*’. Similarly, the NRON ncRNA can bind to the 
NFAT (nuclear factor of activated T cells)-transcription factor rendering 
it inactive because it prevents nuclear accumulation’®. ncRNAs can also 
function as molecular ‘decoys’ by preventing correct regulation through 
competitive binding (Fig. 2f). For example, the GAS5 ncRNA binds to 
the glucocorticoid receptor and prevents the receptor from binding to 
its correct regulatory elements”, and the PANDA ncRNA can prevent 
NF-Y localization, which leads to apoptosis”. Similarly, several studies 
have shown that ncRNAs can function as decoys to other RNA species, 
such as miRNAs, to control miRNA levels”””. 
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2. DNA-RNA 


3. Protein-DNA 
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Figure 3 | Modular principles of large ncRNA genes. a, The four principles 
of nucleic acid and protein interactions. (1) RNA-protein interactions, (2) 
DNA-RNA hybridization-based interactions, (3) DNA-protein interactions 
and (4) RNA-RNA hybridization based interactions. b, Each of these principles 
can be combined to build distinct complexes. For example, combining RNA- 
protein and RNA-DNA interactions can localize a protein complex to a specific 
DNA sequence in an RNA-dependent manner; as has been implicated for the 
DHFR” promoter and localization of DNMT3b”™. Combining RNA-protein 
and protein-DNA principles can also localize a diverse set of proteins, which 
have a molecular scaffold created by RNA, to a specific DNA sequence ina 
protein-dependent manner. The ribosome is a multifaceted combination 

of RNA-protein interactions that facilitate correct RNA-RNA interactions 

for the ribozyme activity of the ribosome. The telomere replication activity 

of telomerase is an example of combining RNA-protein, RNA-DNA and 
protein-DNA interactions. 


Large ncRNAs as molecular scaffolds of proteins 

One emerging theme common to many large ncRNAs is the formation 
of multiple distinct RNA-protein interactions that are used to carry 
out their function (Fig. 3). The first indication of this phenomenon 
came from the discovery of telomerase”. Telomerase activity requires 
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a telomerase RNA component (TERC)”, which serves as a template 
for telomeric regulation and as a molecular scaffold for the polymerase 
enzyme around the RNA” (Fig. 3b). Importantly, genetic studies dem- 
onstrated that TERC plays a modular functional role, as genetically 
swapping particular domains of TERC retained the overall function. 
This indicated that TERC was made up of discrete functional modules 
to bring multiple proteins into the proximity of a protein™. 

More recently, HOTAIR was shown to contain distinct protein- 
interaction domains that can associate with polycomb repressive 
complex 2 (PRC2) (ref. 14) and the CoREST-LSD1 complex™, which 
together are required for correct function (Fig. 3b). XIST also has dis- 
crete functional domains. Through a series of genetic deletions XIST 
was shown to contain at least two discrete domains that are responsible 
for silencing (RepA) and localization (RepC)” (Fig. 3b). These func- 
tional domains could be independently deleted without affecting the 
role of the other domain, which suggests the modular nature of the XIST 
ncRNA”. These functional domains of XIST also interact with discrete 
proteins; the silencing domain (RepA) binds to PRC2 and the locali- 
zation domain (RepC) binds to YY1 (ref. 96) and hnRNPU”. These 
examples show that large ncRNAs can function as molecular scaffolds 
of protein complexes. Importantly, this phenomenon is likely to be a 
general one because approximately 30% of ES cell lincRNAs associate 
with multiple regulatory complexes”. 

In addition to interacting with multiple proteins, in several examples, 
ncRNAs have been shown to interact directly with both DNA and RNA. 
ncRNAs for example form triplex structures with DNA”*” (Fig. 3a) such 
as ancRNA that binds to the ribosomal DNA promoter and interacts 
with the DNMT3b protein to silence expression”. Furthermore, RNA 
can form traditional duplex base-pairing interactions with DNA, a 
property that has long been speculated for large ncRNAs”. Finally, RNA 
can form base-pair interactions with RNA (Fig. 3a), which are crucial for 
processes such as tRNA-mRNA anticodon recognition’, ribonuclease 
P recognition of pre-tRNAs’, miRNA targeting", ribosome structure 
as a ribozyme” and splicing regulation®. Despite these examples, the 
interactions between large ncRNAs, genomic DNA and other RNAs 
are not well characterized. 


A potential modular RNA code 

Collectively, the studies reviewed here suggest an intriguing hypoth- 
esis: large ncRNAs are flexible modular scaffolds’>***", In this model, 
RNA contains discrete domains that interact with specific protein com- 
plexes. These RNAs, through a combination of domains, bring specific 
regulatory components into proximity with each other, which results in 
the formation ofa unique functional complex. These RNA regulatory 
complexes can include interactions with proteins but can also extend 
to RNA-DNA and RNA-RNA regulatory interactions. 

RNA is well-suited for this role because it is a malleable evolutionary 
substrate compared with a protein, allowing for the selection of discrete 
interaction domains’. Specifically, RNA can be easily mutated, tested and 
selected without breaking its core functionality*. This model of modular 
interactions can explain the observation that there are highly conserved 
‘patches’ within large ncRNA genes”**'”” that could have evolved for spe- 
cific protein interactions***’**. The remaining regions may be more evo- 
lutionarily flexible, allowing the formation of new functional domains by 
random mutation and selection. This is consistent with the observation 
that non-constrained regions of telomerase are dispensable™. 

The model of RNA as a modular scaffold is not limited to protein 
interactions. RNA can also base-pair with DNA, which might be used to 
guide complexes to specific DNA sequences. Alternatively, RNAs might 
guide complexes by bridging together sets of DNA-binding proteins. 
Such a model could explain how the same protein complexes are guided 
to different DNA loci in distinct cell types. 

Large ncRNAs can also form RNA-RNA interactions, raising intrigu- 
ing possibilities for future investigations. For example, two large RNA 
molecular scaffolds might be linked through RNA-RNA interactions. 
Another possibility is that RNA-RNA interactions could result in 
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unique RNA structures that can interact with protein complexes that 
are not attainable by the individual units. This has been observed in 
the ribosome, where the combination of RNA-RNA and RNA-protein 
interactions are required for correct complex formation. 


Outlook 

We are only beginning to understand the mechanism by which large 
ncRNAs carry out their regulatory function. A modular RNA regula- 
tory code is an attractive hypothesis but remains to be tested; in par- 
ticular, the way in which large ncRNAs, and proteins interact, and the 
underlying molecular principles are still unknown. Understanding these 
principles will require the identification of the sites of the RNA-protein 
interactions and the exact RNA-binding proteins in vivo. Furthermore, 
the way in which large ncRNAs localize to their target genes is unknown 
but could involve direct RNA-DNA interactions (Fig. 3a) or interac- 
tions with proteins that contain DNA recognition elements, which has 
been suggested for XIST”° and HOTAIR™. To gain insight into these 
processes, it will be important to catalogue the interactions that ncRNAs 
form with genomic DNA and RNAs. These data will help elucidate the 
rules that guide these interactions as well as the functional implications 
of these associations, which can then be tested experimentally. 

If large ncRNAs are truly modular, then each individual domain 
would have a unique function that is independent of other domains. 
Demonstrating modularity will require the genetic deletion of domains 
and spacer regions, as well as domain-swapping experiments. Learning 
these principles would result in a defined ‘modular RNA code’ for how 
RNAs can affect cell states. By truly understanding this modular RNA 
code, it may be possible to create synthetically engineered RNAs that 
could interact with both nucleic acids and protein modules to carry 
out engineered regulatory roles. However, at present, it is premature 
to dismiss the possibility of large ncRNAs having other mechanisms 
of action that may not fit neatly into this modular RNA code. In the 
meantime, it is clear that mammalian genomes encode a diverse set of 
large important ncRNAs. 


1. Warner, J. R., Soeiro, R., Birnboim, H. C., Girard, M. & Darnell, J. E. Rapidly 
labeled HeLa cell nuclear RNA. |. Identification by zone sedimentation of a 
heterogeneous fraction separate from ribosomal precursor RNA. J. Mol. Biol. 
19, 349-361 (1966). 

2. Salditt-Georgieff, M., Harpold, M. M., Wilson, M. C. & Darnell, J. E., Jr. Large 
heterogeneous nuclear ribonucleic acid has three times as many 5° caps 
as polyadenylic acid segments, and most caps do not enter polyribosomes. 
Mol. Cell. Biol. 1, 179-187 (1981). 

This paper demonstrates an abundant class of RNA species that do not enter 
polyribosomes. 

3. Weinberg, R. A. & Penman, S. Small molecular weight monodisperse nuclear 

RNA. J. Mol. Biol. 38, 289-304 (1968). 

4. Zieve, G. & Penman, S. Small RNA species of the HeLa cell: metabolism and 

subcellular localization. Cel/ 8, 19-31 (1976). 

5. Gesteland, R. F., Cech, T. & Atkins, J. F. The RNA World : The Nature of 

Modern RNA Suggests a Prebiotic RNA World. 3rd edn (Cold Spring Harbor 

Laboratory Press, 2006). 

6. Eddy, S. R. Non-coding RNA genes and 

Nature Rev. Genet. 2, 919-929 (2001). 

7. Pachnis, V., Brannan, C. |. & Tilghman, S. M. The structure and expression of a 

novel gene activated in early mouse embryogenesis. EMBO J. 7, 673-681 (1988). 

8. Brannan, C. |., Dees, E. C., Ingram, R. S. & Tilghman, S. M. The product of the 

H19 gene may function as an RNA. Mol. Cell. Biol. 10, 28-36 (1990). 

This paper was the first report of a large ncRNA showing that the H19 

transcript lacked conserved ORFs and did not make a protein product in vivo. 
9. Brown, C. J. etal. A gene from the region of the human X inactivation centre is 
expressed exclusively from the inactive X chromosome. Nature 349, 38-44 (1991). 
0. Penny, G. D., Kay, G. F., Sheardown, S. A., Rastan, S. & Brockdorff, N. Requirement 
for Xist in X chromosome inactivation. Nature 379, 131-137 (1996). 

1. Sleutels, F., Zwart, R. & Barlow, D. P. The non-coding Air RNA is required for 
silencing autosomal imprinted genes. Nature 415, 810-813 (2002). 

2. Young, T. L., Matsuda, T. & Cepko, C. L. The noncoding RNA taurine upregulated 
gene 1 is required for differentiation of the murine retina. Curr. Biol. 
15, 501-512 (2005). 
3. Willingham, A. T. et a/. A strategy for probing the function of noncoding RNAs 
finds a repressor of NFAT. Science 309, 1570-1573 (2005). 

4. Rinn, J. L. et a/. Functional demarcation of active and silent chromatin domains 
in human HOX loci by noncoding RNAs. Cel/ 129, 1311-1323 (2007). 

5. Carninci, P. et al. The transcriptional landscape of the mammalian genome. 
Science 309, 1559-1563 (2005). 


he modern RNA world. 


© 2012 Macmillan Publishers Limited. A 


16. 


17. 


18. 


19. 


20. 


Pale 


22. 


23. 


24. 


20: 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


39: 


36. 


37. 


48. 


49. 


50. 


REVIEW 


This paper describes the large-scale cDNA sequencing efforts in the mouse 
genome and reveals many thousands of non-coding transcripts. 

Birney, E. et a/. Identification and analysis of functional elements in 1% of the 
human genome by the ENCODE pilot project. Nature 447, 799-816 (2007). 
Bertone, P. et al. Global identification of human transcribed sequences with 
genome tiling arrays. Science 306, 2242-2246 (2004). 

Kapranoy, P. et al. RNA maps reveal new RNA classes and a possible function 
for pervasive transcription. Science 316, 1484-1488 (2007). 

Rinn, J. L. et al. The transcriptional activity of human Chromosome 22. 

Genes Dev. 17, 529-540 (2003). 

Kapranoy, P. et al. Large-scale transcriptional activity in chromosomes 21 and 
22. Science 296, 916-919 (2002). 

Ebisuya, M., Yamamoto, T., Nakajima, M. & Nishida, E. Ripples from 
neighbouring transcription. Nature Cell Biol. 10, 1106-1113 (2008). 

Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. 
Nature Struct. Mol. Biol. 14, 103-105 (2007). 

Guttman, M. et al. lincRNAs act in the circuitry controlling pluripotency and 
differentiation. Nature 477, 295-300 (2011). 

Orom, U. A. et al. Long noncoding RNAs with enhancer-like function in human 
cells. Cell 143, 46-58 (2010). 

Guttman, M. et al. Chromatin signature reveals over a thousand highly 
conserved large non-coding RNAs in mammals. Nature 458, 223-227 (2009). 
This paper applied a chromatin signature to identify lincRNAs and used 

a guilt-by-association approach to classify their likely functions in diverse 
biological processes. 

Huarte, M. et al. A large intergenic noncoding RNA induced by p53 mediates 
global gene repression in the p53 response. Cel/ 142, 409-419 (2010). 

Hung, T. et a/. Extensive and coordinated transcription of noncoding RNAs within 
cell-cycle promoters. Nature Genet. 43, 621-629 (2011). 

Wang, K. C. et al. A long noncoding RNA maintains active chromatin to 
coordinate homeotic gene expression. Nature 472, 120-124 (2011). 

Wilusz, J. E., Freier, S. M. & Spector, D. L. 3’ end processing of a long 
nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. 

Cell 135, 919-932 (2008). 

Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and 
quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 
621-628 (2008). 

Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes 
in mouse reveals the conserved multi-exonic structure of lincRNAs. 

Nature Biotechnol. 28, 503-510 (2010). 

Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding 
RNAs reveals global properties and specific subclasses. Genes Dev. 25, 
1915-1927 (2011). 

ercer, T. R., Dinger, M. E., Sunkin, S. M., Mehler, M. F. & Mattick, J. S. Specific 
expression of long noncoding RNAs in the mouse brain. Proc. Nat! Acad. Sci. USA 
105, 716-721 (2008). 

De Santa, F. et al. A large fraction of extragenic RNA Pol II transcription sites 
overlap enhancers. PLoS Biol. 8, 1000384 (2010). 

Kim, T. K. et al. Widespread transcription at neuronal activity-regulated 
enhancers. Nature 465, 182-187 (2010). 

Ravasi, T. et a/. Experimental validation of the regulated expression of large 
numbers of non-coding RNAs from the mouse genome. Genome Res. 16, 
11-19 (2006). 

Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? 
Evidence for selection within long noncoding RNAs. Genome Res. 17, 
556-565 (2007). 


. Taft, R. J. et al. Tiny RNAs associated with transcription start sites in animals. 


Nature Genet. 41, 572-578 (2009). 


. Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 


1849-1851 (2008). 


. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693-705 


(2007). 


. Barski, A. et al. High-resolution profiling of histone methylations in the human 


genome. Cel/ 129, 823-837 (2007). 


. Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. 


Nature 457, 854-858 (2009). 


. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global 


cell-type-specific gene expression. Nature 459, 108-112 (2009). 


. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and 


lineage-committed cells. Nature 448, 553-560 (2007). 


. Khalil, A. M. et a/. Many human large intergenic noncoding RNAs associate with 


chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. 
USA 106, 11667-11672 (2009). 


. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine 


human cell types. Nature 473, 43-49 (2011). 


. Loewer, S. et al. Large intergenic non-coding RNA-RoR modulates 


reprogramming of human induced pluripotent stem cells. Nature Genet. 42, 
1113-1117 (2010). 

Dinger, M. E., Pang, K. C., Mercer, T. R. & Mattick, J. S. Differentiating protein- 
coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 
4, e1000176 (2008). 

Brockdorff, N. et al. The product of the mouse Xist gene is a 15 kb inactive 
X-specific transcript containing no conserved ORF and located in the nucleus. 
Cell 71, 515-526 (1992). 

Lin, M. F., Deoras, A. N., Rasmussen, M. D. & Kellis, M. Performance and 
scalability of discriminative metrics for comparative gene identification in 


16 FEBRUARY 2012 | VOL 482 | NATURE | 345 
rights reserved 


REVIEW 


51. 


52. 
53. 


54. 


55. 
56. 
57. 
58. 
59. 


60. 
61. 
62. 


63. 
64. 


65. 
66. 


67. 
68. 
69. 


70. 
71. 
72. 


73. 
74. 
75. 


76. 


77. 


12 Drosophila genomes. PLoS Comput. Biol. 4, e1000067 (2008). 

Lin, M. F., Jungreis, |. & Kellis, M. PhyloCSF: a comparative genomics method 
to distinguish protein coding and non-coding regions. Bioinformatics 27, 
1275-1282 (2011). 

Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 38, 
D211-D222 (2010). 

Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse 
embryonic stem cells reveals the complexity and dynamics of mammalian 
proteomes. Cell 147, 789-802(2011). 

Galindo, M. |., Pueyo, J. |., Fouix, S., Bishop, S. A. & Couso, J. P. Peptides encoded 
by short ORFs control development and define a new eukaryotic gene family. 
PLoS Biol. 5, e106 (2007). 

This paper demonstrates the existence of functional small peptides within a 
presumed ‘non-coding’ transcript through ORF conservation, in vivo protein 
identification and functional analysis. 

Kondo, T. et al. Small peptides switch the transcriptional activity of Shavenbaby 
during Drosophila embryogenesis. Science 329, 336-339 (2010). 

Jiao, Y. & Meyerowitz, E. M. Cell-type specific analysis of translating RNAs in 
developing flowers reveals new levels of control. Mol. Syst. Biol. 6, 419 (2010). 
Li, Y. M. et a/. The H19 transcript is associated with polysomes and may regulate 
IGF2 expression in trans. J. Biol. Chem. 273, 28247-28252 (1998). 

Cai, X. & Cullen, B. R. The imprinted H19 noncoding RNA is a primary microRNA 
precursor. RNA 13, 313-316 (2007). 

Yang, L. et al. ncRNA- and Pc2 methylation-dependent gene relocation between 
nuclear structures mediates gene activation programs. Cell 147, 773-788 
(2011). 

Clamp, M. et a/. Distinguishing protein-coding and noncoding genes in the 
human genome. Proc. Nat! Acad. Sci. USA 104, 19428-19433 (2007). 
Kastenmayer, J. P. et al. Functional genomics of genes with small open reading 
frames (SORFs) in S. cerevisiae. Genome Res. 16, 365-373 (2006). 

Hanada, K., Zhang, X., Borevitz, J. O., Li, W. H. & Shiu, S. H. A large number 

of novel coding small open reading frames in the intergenic regions of the 
Arabidopsis thaliana genome are transcribed and/or under purifying selection. 
Genome Res 17, 632-640 (2007). 

Mattick, J. S. The genetic signatures of noncoding RNAs. PLoS Genet. 5, 
e1000459 (2009). 

Tsai, M. C. et al. Long noncoding RNA as modular scaffold of histone 
modification complexes. Science 329, 689-693 (2010). 

This paper identified multiple protein-interaction domains within HOTAIR that 
together allowed it to carry out its function, which demonstrated that a large 
ncRNA can act as a molecular scaffold. 

Gupta, R. A. et a/. Long non-coding RNA HOTAIR reprograms chromatin state to 
promote cancer metastasis. Nature 464, 1071-1076 (2010). 

Zappulla, D. C. & Cech, T. R. Yeast telomerase RNA: a flexible scaffold for protein 
subunits. Proc. Nat! Acad. Sci. USA 101, 10024-10029 (2004). 

This paper demonstrated that telomerase RNA can bridge proteins by showing 
that protein interaction domains can be swapped and spacer regions deleted 
with minimal impact on the function of the RNA. 

Korostelev, A. & Noller, H. F. The ribosome in focus: new structures bring new 
insights. Trends Biochem. Sci. 32, 434-441 (2007). 

Ivanova, N. et al. Dissecting self-renewal in stem cells with RNA interference. 
Nature 442, 533-538 (2006). 

Martens, J. A., Laprade, L. & Winston, F. Intergenic transcription is required 

to repress the Saccharomyces cerevisiae SER3 gene. Nature 429, 571-574 
(2004). 

Schmitt, S., Prestel, M. & Paro, R. Intergenic transcription through a Polycomb 
group response element counteracts silencing. Genes Dev. 19, 697-708 (2005). 
Lee, J. T. Lessons from X-chromosome inactivation: long ncRNA as guides and 
tethers to the epigenome. Genes Dev. 23, 1831-1842 (2009). 

Ponjavic, J., Oliver, P. L., Lunter, G. & Ponting, C. P. Genomic and transcriptional 
co-localization of protein-coding and long non-coding RNA pairs in the 
developing brain. PLoS Genet. 5, e1000617 (2009). 

Tian, D., Sun, S. & Lee, J. T. The long noncoding RNA, Jpx, is a molecular switch 
for X chromosome inactivation. Ce// 143, 390-403 (2010). 

Koerner, M. V., Pauler, F. M., Huang, R. & Barlow, D. P. The function of non-coding 
RNAs in genomic imprinting. Development 136, 1771-1783 (2009). 

Pandey, R. R. et al. Kcnqlot1 antisense noncoding RNA mediates lineage- 
specific transcriptional silencing through chromatin-level regulation. Mol. Cell 
32, 232-246 (2008). 

Bertani, S., Sauer, S., Bolotin, E. & Sauer, F. The noncoding RNA Mistral activates 
Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to 
chromatin. Mol. Cell 43, 1040-1046 (2011). 

Feng, J. et al. The Evf-2 noncoding RNA is transcribed from the DIx-5/6 


346 | NATURE | VOL 482 | 16 FEBRUARY 2012 
© 2012 Macmillan Publishers Limited. All rights reserved 


78. 


79. 


80. 


81. 


82. 


83. 
84. 


85. 


86. 


87. 


88. 


89. 
90. 


91. 


92. 


93. 


94. 


95. 


96. 


97. 


98. 


99. 


ultraconserved region and functions as a DIx-2 transcriptional coactivator. 
Genes Dev. 20, 1470-1484 (2006). 

Koziol, M. J. & Rinn, J. L. RNA traffic control of chromatin complexes. Curr. Opin. 
Genet. Dev. 20, 142-148 (2010). 

Maison, C. et al. Higher-order structure in pericentric heterochromatin involves 
a distinct pattern of histone modification and an RNA component. Nature Genet. 
30, 329-334 (2002). 

Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated 
histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell. 
Biol. 26, 2560-2569 (2006). 

Wutz, A., Rasmussen, T. P. & Jaenisch, R. Chromosomal silencing and localization 
are mediated by different domains of Xist RNA. Nature Genet. 30, 167-174 
(2002). 

This paper reported the generation of deletion mutants across the Xist 

locus and identified the discrete domains responsible for the silencing and 
localization roles of the RNA. 

Chu, C., Qu, K., Zhong, F. L., Artandi, S. E. & Chang, H. Y. Genomic maps of long 
noncoding RNA occupancy reveal principles of RNA-chromatin interactions. 
Mol. Cell 44, 667-678 (2011). 

Simon, M. D. et al. The genomic binding-sites of a non-coding RNA. Proc. Nat! 
Acad. Sci. USA 108, 20497-20502 (2011). 

Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins 
targeted by a short repeat RNA to the mouse X chromosome. Science 322, 
750-756 (2008). 

Plath, K., Mlynarczyk-Evans, S., Nusinow, D. A. & Panning, B. Xist RNA and the 
mechanism of X chromosome inactivation. Annu. Rev. Genet. 36, 233-278 
(2002). 

Nagano, T. et a/. The Air noncoding RNA epigenetically silences transcription 
by targeting G9a to chromatin. Science 322, 1717-1720 (2008). 

Zhao, J. et al. Genome-wide identification of Polycomb-associated RNAs by 
RIP-seq. Mol. Cell 40, 939-953 (2010). 

Kaneko, S. et a/. Phosphorylation of the PRC2 component Ezh2 is cell cycle- 
regulated and up-regulates its binding to ncRNA. Genes Dev. 24, 2615-2620 
(2010). 

Wang, X. et al. Induced ncRNAs allosterically modify RNA-binding proteins in 
cis to inhibit transcription. Nature 454, 126-130 (2008). 

Kino, T., Hurt, D. E., Ichijo, T., Nader, N. & Chrousos, G. P. Noncoding RNA Gas5 
is a growth arrest- and starvation-associated repressor of the glucocorticoid 
receptor. Sci. Signal 3, ra8 (2010). 

Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis: 
the Rosetta stone of a hidden RNA language? Cel/ 146, 353-358 (2011). 
Cesana, M. et a/. A long noncoding RNA controls muscle differentiation by 
functioning as a competing endogenous RNA. Cell 147, 358-369 (2011). 
Greider, C. W. & Blackburn, E. H. Identification of a specific telomere terminal 
transferase activity in Tetrahymena extracts. Cell 43, 405-413 (1985). 

Feng, J. et al. The RNA component of human telomerase. Science 269, 
1236-1241 (1995). 

Lingner, J. et al. Reverse transcriptase motifs in the catalytic subunit of 
telomerase. Science 276, 561-567 (1997). 

Jeon, Y. & Lee, J. T. YY1 tethers Xist RNA to the inactive X nucleation center. 
Cell 146, 119-133 (2011). 

Hasegawa, Y., Brockdorff, N., Kawano, S., Tsutui, K. & Nakagawa, S. The matrix 
protein hnRNP U is required for chromosomal localization of Xist RNA. Dev. 
Cell 19, 469-476 (2010). 

Schmitz, K. M., Mayer, C., Postepska, A. & Grummt, |. Interaction of noncoding 
RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing 
of rRNA genes. Genes Dev. 24, 2264-2269 (2010). 

Martianov, |., Ramadass, A., Serra Barros, A., Chow, N. & Akoulitchey, A. 
Repression of the human dihydrofolate reductase gene by a non-coding 
interfering transcript. Nature 445, 666-670 (2007). 


100.Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Ce// 136, 


215-233 (2009). 


Acknowledgements We thank M. Cabili, J. Engreitz, M. Garber, P. McDonel and A. 
Pauli for their reading and suggestions; T. Cech for comments and suggestions; 

E. Lander for helpful discussions and ideas; and S. Knemeyer and L. Gaffney for 

assistance with figures in this Review. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial inter- 
ests. Readers are welcome to comment on the online version of this article 
at www.nature.com/nature. Correspondence should be addressed to M.G. 
(mguttman@mit.edu) and J.L.R. (john_rinn@harvard.edu). 


REVIEW 


doi:10.1038/nature10888 


The microcosmos of cancer 


Amaia Lujambio' & Scott W. Lowe’* 


The discovery of microRNAs (miRNAs) almost two decades ago established a new paradigm of gene regulation. During 
the past ten years these tiny non-coding RNAs have been linked to virtually all known physiological and pathological 
processes, including cancer. In the same way as certain key protein-coding genes, miRNAs can be deregulated in cancer, 
in which they can function as a group to mark differentiation states or individually as bona fide oncogenes or tumour 
suppressors. Importantly, miRNA biology can be harnessed experimentally to investigate cancer phenotypes or used 


therapeutically as a target for drugs or as the drug itself. 


icroRNAs (miRNAs) are small, evolutionarily conserved, 
Wee RNAs of 18-25 nucleotides in length that have an 

important function in gene regulation. Mature miRNA prod- 
ucts are generated from a longer primary miRNA (pri-miRNA) tran- 
script through sequential processing by the ribonucleases Drosha and 
Dicer] (ref. 1). The first description of miRNAs was made in 1993 in 
Caenorhabditis elegans as regulators of developmental timing”. Later, 
miRNAs were shown to inhibit their target genes through sequences that 
are complementary to the target messenger RNA, leading to decreased 
expression of the target protein’ (Box 1). This discovery resulted in a pat- 
tern shift in our understanding of gene regulation because miRNAs are 
now known to repress thousands of target genes and coordinate normal 
processes, including cellular proliferation, differentiation and apoptosis. 
The aberrant expression or alteration of miRNAs also contributes to a 
range of human pathologies, including cancer. 

The control of gene expression by miRNAs is a process seen in virtu- 
ally all cancer cells. These cells show alterations in their miRNA expres- 
sion profiles, and emerging data indicate that these patterns could be 
useful in improving the classification of cancers and predicting their 
behaviour. In addition, miRNAs have now been shown to behave as 
cancer drivers’ in the same way as protein-coding genes whose altera- 
tions actively and profoundly contribute to malignant transformation 
and cancer progression. Owing to the capacity of miRNAs to modulate 
tens to hundreds of target genes, they are emerging as important factors 
in the control of the ‘hallmarks’ of cancer’. In this Review, we summarize 
the findings that provide evidence for the central role of miRNAs in 
controlling cellular transformation and tumour progression. We also 
highlight the potential uses of miRNAs and miRNA-based drugs in 
cancer therapy and discuss the obstacles that will need to be overcome. 


miRNAs are cancer genes 

In 2002, Croce and colleagues first demonstrated that an miRNA cluster 
was frequently deleted or downregulated in chronic lymphocytic leukae- 
mia’. This discovery suggested that non-coding genes were contributing 
to the development of cancer, and paved the way for the closer investiga- 
tion of miRNA loss or amplification in tumours. Subsequently, miRNAs 
were shown to be differentially expressed in cancer cells, in which they 
formed distinct and unique miRNA expression patterns®, and whole 
classes of miRNAs could be controlled directly by key oncogenic tran- 
scription factors’. In parallel, studies with mouse models established that 
miRNAs were actively involved in tumorigenesis®. Collectively, these find- 
ings provided the first key insights into the relevance of miRNA biology 
in human cancer. 


Despite these results, the sheer extent of involvement of miRNAs in 
cancer was not anticipated. miRNA genes are usually located in small 
chromosomal alterations in tumours (in amplifications, deletions or 
linked to regions of loss of heterozygosity) or in common chromosomal- 
breakpoints that are associated with the development of cancer’. In 
addition to structural genetic alterations, miRNAs can also be silenced 
by promoter DNA methylation and loss of histone acetylation”. Inter- 
estingly, somatic translocations in miRNA target sites can also occur, 
representing a drastic means of altering miRNA function’. The fre- 
quent deregulation of individual or clusters of miRNAs at multiple lev- 
els mirrors the deregulation for protein-coding oncogenes or tumour 
suppressors (Table 1). 

In principle, somatic mutations that change an miRNA seed sequence 
could lead to the aberrant repression of tumour-suppressive mRNAs, 
but these seem to be infrequent”’. Further sequencing could change this 
view, but this observation suggests that the intensity of miRNA signal- 
ling (altered by miRNA overexpression or underexpression) is more 
crucial than the specificity of the response. However, recent data indi- 
cate that miRNAs with an altered sequence can be produced through 
variable cleavage sites for Drosha and Dicer], and that the presence of 
these variants can be perturbed in cancer™. Although the function of the 
variant ‘isomiRs’ remains unclear, in principle they could alter the qual- 
ity of miRNA effects. State-of-the-art sequencing techniques will help 
to unmask mutations or modifications that otherwise would remain 
undetected. Whatever the mechanism, the widespread alteration in the 
expression of miRNAs is a ubiquitous feature of cancer. 


miRNAS as cancer classifiers 
Aberrant miRNA levels reflect the physiological state of cancer cells and 
can be detected by miRNA expression profiling and harnessed for the 
purpose of diagnosis and prognosis**”*. In fact, miRNA profiling can 
be more accurate at classifying tumours than mRNA profiling because 
miRNA expression correlates closely with tumour origin and stage, and 
can be used to classify poorly differentiated tumours that are difficult to 
identify using a standard histological approach®"’. Whether or not this 
increased classification power relates to the biology of miRNAs or the 
reduced complexity of the miRNA genome still needs to be determined. 
The special features of miRNAs make them potentially useful for 
detection in clinical specimens. For example, miRNAs are relatively 
resistant to ribonuclease degradation, and they can be easily extracted 
from small biopsies, frozen samples and even formalin-fixed, paraffin- 
embedded tissues’*. Furthermore, relatively simple and reproducible 
assays have been developed to detect the abundance of individual 
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REVIEW 


BOX1 


Biogenesis and function 
of miRNAs 


miRNAs are subjected to a unique biogenesis that is closely 
related to their regulatory functions. As the pathway in Fig. 1 
shows, in general miRNAs are transcribed by RNA polymerase 

| into primary transcripts called pri-miRNAs’®. The primary 
transcripts contain a 5’ cap structure a poly(A)* tail and may 
include introns, similar to the transcripts of protein-coding 
genes’®. They also contain a region in which the sequences are 
not perfectly complementary, known as the stem-loop structure, 
which is recognized in the nucleus by the ribonuclease Drosha 
and its partner DGCR8, giving rise to the precursor miRNA (pre- 
miRNA) by cropping’®. However, some intronic miRNAs (called 
mirtrons) bypass the Drosha processing step and, instead, 

use splicing machinery to generate the pre-miRNA”. The pre- 
miRNA is exported from the nucleus to the cytoplasm by XPO5 
and is further cleaved by the ribonuclease Dicer1 (along with 
TARBP2) into a double-stranded miRNA (process known as 
dicing)’®. Again, this cleavage can be substituted by Argonaute- 
2-mediated processing’. 

After strand separation, the guide strand or mature miRNA 
forms, in combination with Argonaute proteins, the RNA- 
induced silencing complex (RISC), whereas the passenger 
strand is usually degraded. The mature strand is important 
for specific-target MRNA recognition and its consequent 
incorporation into the RISC’. The specificity of miRNA targeting 
is defined by how complementary the ‘seed’ sequence 
(positions 2 to 8 from the 5’ end of the miRNA) and the ‘seed- 
match’ sequence (generally in the 3’ untranslated region of 
the target MRNA) are. The expression of the target mRNAs 
is silenced by miRNAs, either by mRNA cleavage (‘slicing’) 
or by translational repression’. In addition, miRNAs have a 
number of unexpected functions, including the targeting of 
DNA, ribonucleoproteins or increasing the expression of a 
target mRNA®. Overall, data indicate the complexity of miRNA- 
mediated gene regulation and highlight the importance of a 
better understanding of miRNA biology. 


miRNAs, and methods that combine small RNA isolation, PCR and 
next-generation sequencing, allow accurate and quantitative assessment 
of all the miRNAs that are expressed in a patient specimen, includ- 
ing material that has been isolated by laser capture microdissection. 
The detection of global miRNA expression patterns for the diagnosis 
of cancers has not yet been proved; however, some individual or small 
groups of miRNAs have shown promise. For example, in non-small cell 
lung cancer, the combination of high miR-155 and low let-7 expression 
correlates with a poor prognosis, and in chronic lymphocytic leukae- 
mia a 13 miRNA signature is associated with disease progression’*"®. 
Further advances in the technology of miRNA profiling could help to 
revolutionize molecular pathology. 

Perhaps the most appealing application of miRNAs as a cancer diag- 
nostic tool comes from the discovery of circulating miRNAs in serum. 
For example, miR-141 expression levels in serum were significantly 
higher in patients with prostate cancer than in healthy control individu- 
als’. Although the analysis of circulating miRNAs is only just begin- 
ning, the successful advancement of this technology could provide a 
relatively non-invasive diagnostic tool for single-point or longitudinal 
studies. With such diagnostic tools in place, miRNA profiling could be 
used to guide cancer classification, facilitate treatment decisions, monitor 
treatment efficacy and predict clinical outcome. 
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When miRNA biogenesis goes awry 

Although the expression of some miRNAs is increased in malignant 
cells, the widespread underexpression of miRNAs is a more common 
phenomenon. Whether this tendency is a reflection of a pattern associ- 
ated with specific cells of origin, is a consequence of the malignant state 
or actively contributes to cancer development is still unclear. Because 
miRNA expression generally increases as cells differentiate, the appar- 
ent underexpression of miRNAs in cancer cells may, in part, bea result 
of miRNAs being ‘locked’ in a less-differentiated state. Alternatively, 
changes in oncogenic transcription factors that repress miRNAs or vari- 
ability in the expression or activity of the miRNA processing machinery 
could also be important. 

Two main mechanisms have been proposed as the underlying cause 
of the global downregulation of miRNAs in cancer cells. One involves 
transcriptional repression by oncogenic transcription factors. For exam- 
ple, the MYC oncoprotein, which is overexpressed in many cancers, 
transcriptionally represses certain miRNAs, although the extent to 
which this mediates its oncogenic activity or reflects a peripheral effect 
is still unknown”. The other mechanism proposed involves changes 
in miRNA biogenesis and is based on the observation that cancer cells 
often display reduced levels, or altered activity, of factors in the miRNA 
biogenesis pathway”' (Box 1, Fig. 1). 

In vivo studies have provided the most direct evidence of an active role 
for miRNA downregulation in at least some types of cancer. For exam- 
ple, analysis of mouse models in which the core enzymes of miRNA bio- 
genesis have been constitutively or conditionally disrupted by different 
mechanisms suggests that these molecules function as haploinsufficient 
tumour suppressors. Thus, the repression of miRNA processing by the 
partial depletion of Dicerl and Drosha accelerates cellular transfor- 
mation and tumorigenesis in vivo”. Furthermore, deletion of a single 
Dicer] allele in lung epithelia promotes Kras-driven lung adenocarcino- 
mas, whereas complete ablation of Dicer1 causes lethality because of the 
need for miRNAs in essential processes”. Consistent with the potential 
relevance of these mechanisms, reduced Dicer1 and Drosha levels have 
been associated with poor prognosis in the clinic™*. In addition to the 
core machinery, modulators of miRNA processing can also function 
as haploinsufficient tumour suppressors. Hence, point mutations that 
affect TARBP2 or XPOS are correlated with sporadic and hereditary car- 
cinomas that have microsatellite instability””*. Other miRNA modula- 
tors that influence the processing of only a subset of miRNAs could also 
be important. For example, LIN28A and LIN28B can bind and repress 
members of the let-7 family (which are established tumour-suppressor 
miRNAs; Table 1), but this binding can be counteracted by KHSRP 
(KH-type splicing regulatory protein), also a factor involved in miRNA 
biogenesis; together this binding and counteracting dictate the level of 
mature let-7. The processing of miRNAs can be regulated by other genes 
including DDX5 (helicase p68) or the SMAD 1 and SMAD 5 proteins, 
which may contribute to cancer development through the deregulation 
of miRNAs”. Collectively, the global changes in miRNA expression that 
are seen in cancer cells probably arise through multiple mechanisms; 
the combined small changes in the expression of many miRNAs seem 
to have a large impact on the malignant state. 


miRNAs as cancer drivers 
Functional studies show that miRNAs that are affected by somatic 
alterations in tumours can affect cancer phenotypes directly, therefore 
confirming their driver function in malignancy. As drivers of malig- 
nancy, mechanistic studies show that these miRNAs interact with known 
cancer networks; hence, tumour-suppressor miRNAs can negatively 
regulate protein-coding oncogenes, whereas oncogenic miRNAs often 
repress known tumour suppressors (Fig. 2a). Perhaps the best example 
of this is the oncogenic miR- 17-92 cluster, in which individual miRNAs 
suppress negative regulators of phosphatidylinositol-3-OH kinase signal- 
ling or pro-apoptotic members of the BCL-2 family, which disrupts the 
processes that are known to influence cancer development”® (Table 1). 
Cancer-associated miRNAs can also alter the epigenetic landscape 
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Table 1 | Key microRNAs involved in cancer 


REVIEW 


MicroRNA Function Genomic Mechanism Targets Cancer type Mouse models Clinical application 
location 
miR-17-92 | Oncogene 13q22 Amplification and BIM, Lymphoma, lung, breast, Cooperates with MYC Inhibition and 
cluster transcriptional PTEN, stomach, colon and to produce lymphoma. detection 
activation CDKN1A pancreatic cancer Overexpression induces 
and ymphoproliferative disease 
PRKAA1 
miR-155 Oncogene 21q21 Transcriptional SHIP1 and = Chronic lymphocytic Overexpression induces Inhibition and 
activation CEBPB leukaemia, lymphoma, pre-B-cell lymphoma and detection 
lung, breast and colon eukaemia 
cancer 
miR-21 Oncogene 17q23 Transcriptional PTEN, Chronic lymphocytic Overexpression induces Inhibition and 
activation PDCD4 and _ leukaemia, acute myeloid lymphoma detection 
TPM1 leukaemia, glioblastoma, 
pancreatic, breast, lung, 
prostate, colon and 
stomach cancer 
miR- Tumour 3q31 Deletion, mutation BCL2 and = Chronic lymphocytic Deletion causes chronic Expression with 
15a/16-1 suppressor and transcriptional MCL1 leukaemia, prostate cancer lymphocytic leukaemia mimics and viral 
repression and pituitary adenomas vectors 
et-7 family © Tumour 1 copies Transcriptional KRAS, Lung, colon, stomach, Overexpression suppresses Expression with 
suppressor (multiple repression MYC and ovarian and breastcancer lung cancer mimics and viral 
locations) HMGA2 vectors 
miR-34 Tumour p36 and Epigenetic silencing, | CDK4, Colon, lung, breast, No published studies Expression with 
‘amily suppressor 1q23 transcriptional CDK6, MYC kidney, bladder cancer, mimics and viral 
repression and and MET neuroblastoma and vectors 
deletion melanoma 
miR-29 Oncogene 7q32 and Transcriptional ZFP36 Breast cancer and indolent Overexpression induces No published 
‘amily q30 activation chronic lymphocytic chronic lymphocytic studies 
leukaemia leukaemia 
Tumour Deletion and DNMTs Acute myeloid leukaemia, No published 
suppressor transcriptional aggressive chronic studies 
repression lymphocytic leukaemia and 


lung cancer 


BCL2, B-cell lymphoma protein-2; BIM, BCL-2-interactiing mediator of cell death; CDKN1A, cyclin-dependent kinase inhibitor 1A; CEBPB, CCAAT/enhancer binding protein B; HMGA2, high mobility group 
AT-hook 2; CDK4, cyclin-dependent kinase 4; CDK6, cyclin-dependent kinase 6; DNMT, DNA methyltransferase; MCL1, myeloid cell leukaemia sequence 1; PTEN, phosphatase and tensin homologue; 
PRKAAI, protein kinase, AMP-activated, alpha 1 catalytic subunit; PDCD4, programmed cell death 4; SHIP1, Src homology 2 domain-containing inositol 5-phosphatase 1; TPM1, tropomyosin 1; ZFP36, zinc 


finger protein 36. 


of cancer cells. The cancer ‘epigenome is characterized by global 
and gene-specific changes in DNA methylation, histone modifica- 
tion patterns and chromatin-modifying enzyme expression profiles, 
which impact gene expression in a heritable way”. In one way, miRNA 
expression can be altered by DNA methylation or histone modifications 
in cancer cells!”*°, but miRNAs can also regulate components of the 
epigenetic machinery, therefore indirectly contributing to the repro- 
gramming of cancer cells. For example, miR-29 inhibits DNMT3A and 
DNMT3B expression in lung cancer*’, whereas miR-101 regulates the 
histone methyltransferase EZH2 in prostate cancer*”. The presence of 
mature miRNAs in the nucleus” is another indication of the potentially 
direct role that miRNAs have in controlling epigenetic modifications, 
such as DNA methylation and histone modifications — a hypothesis 
that has been established in plants™ but still needs to be demonstrated 
with certainty in mammals. 

In the same way as protein-coding genes, miRNAs can be oncogenes or 
tumour suppressors depending on the cellular context in which they are 
expressed, which means that defining their precise contribution to cancer 
can be a challenge (Fig. 2b). The fact that miRNAs show tissue-specific 
expression and their output, shown in the cell’s physiology, is dependent 
on the expression pattern of the specific mRNAs that harbour target sites 
could explain this apparent paradox. For example, the miR-29 family has 
a tumour-suppressive effect in lung tumours but appears oncogenic in 
breast cancer because ofits ability to target the DNA methyltransferases 
DNMT3A and DNMT3B, and ZFP36, respectively” 135 (Table 1). 

To further complicate the process, some miRNAs repress several posi- 
tive components of a pathway, whereas others target both positive and 
negative regulators, possibly to buffer against minor physiological vari- 
ations that could trigger much larger changes in the cell physiology”. 
In cancer cells, this buffering role can mean that some miRNAs could 


simultaneously target oncogenes and tumour-suppressor genes. In 
addition, combinations of miRNAs can cooperate to regulate one or 
several pathways, which increases the flexibility of regulation but con- 
founds experimentalists” (Fig. 2c). Consequently, the way in which 
miRNAs contribute to cancer development is conceptually similar to 
cancer-associated transcription factors such as MYC and p53, which are 
mediated through many targets that depend on contextual factors that 
are influenced by cell type and micro-environment. From a practical 
perspective it is crucial that miRNA targets are studied in a context that 
is appropriate to the environment that is being studied to determine 
what impact they will have on tumour cell behaviour (Fig. 2b). 


Oncogenic pathways 

Beyond the impact of somatic genetic and epigenetic lesions, the altered 
expression of miRNAs in cancer can arise through the aberrant activ- 
ity of transcription factors that control their expression. Interestingly, 
the same transcription factors are often targets of miRNA-mediated 
repression, which gives rise to complex regulatory circuits and feedback 
mechanisms. Thus, a single transcription factor can activate or repress 
several miRNAs and protein-coding genes; in turn, the alteration in 
miRNA expression can affect more protein-coding genes that then 
amplifies the effects of a single gene. 

As already mentioned, MYC directly contributes to the global 
transcriptional silencing of miRNAs”. This repression involves the 
downregulation of miRNAs with antiproliferative, antitumorigenic and 
pro-apoptotic activity such as, let-7, miR-15a/16-1, miR-26a or miR-34 
family members” (Fig. 2d; Table 1). Initial studies indicate that Myc uses 
both transcriptional and post-transcriptional mechanisms to modulate 
miRNA expression. This phenomenon could be due to LIN28A and 
LIN28B being the direct target of MYC, and that they are required for 
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Figure 1 | Mechanisms of miRNA perturbation in cancer. Cancer cells 
present global downregulation of miRNAs, loss of tumour-suppressor 
miRNAs and specific accumulation of oncogenic miRNAs. The alteration 

in miRNA expression patterns leads to the accumulation of oncogenes and 
downregulation of tumour-suppressor genes, which leads to the promotion of 
cancer development. a, The expression and function of oncogenic miRNAs is 
increased by genomic amplification, activating mutations, loss of epigenetic 
silencing and transcriptional activation. By contrast, tumour-suppressor 
miRNAs are lost by genomic deletion, inactivating mutations, epigenetic 
silencing or transcriptional repression. b, After transcription, global levels 

of miRNAs can be reduced by impaired miRNA biogenesis. Inactivating 
mutations and reduced expression have been described for almost all the 
members of the miRNA processing machinery. If there is a downreguation of 
DROSHA this can lead to a decrease in the cropping of primary miRNA (pri- 
miRNA) to precursor miRNA (pre-miRNA). In the case of XPO5 mutation, 
pre-miRNAs are prevented from being exported to the cytoplasm. Mutation of 
TARBP2 or downregulation of DICER! results in a decrease in mature miRNA 
levels. Pol II, RNA polymerase I]; RISC, RNA-induced silencing complex. 


MYC-mediated repression of let-7 (ref. 38). Furthermore, MYC directly 
activates the transcription of miR-17-92 polycistronic cluster and, given 
its oncogenic role, it may contribute to MYC-induced tumorigenesis”. 
MYC-driven reprogramming of miRNA expression could also be a fac- 
tor in hepatocellular carcinoma, because of the contribution the repro- 
gramming has to the aggressive phenotype of tumours originating from 
hepatic progenitor cells*’. Some miRNAs, such as let-7, also regulate 
MYC, closing the regulatory circuit”. 

miRNAs are embedded in many other oncogenic networks, including 
KRAS activation, which leads to the repression of several miRNAs. For 
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example, in pancreatic cancer with mutant KRAS, RAS-responsive 
element-binding protein 1 (RREB1) represses miR-143 and miR-145 
promoter, and at the same time both KRAS and RREB1 are targets 
of miR-143 and miR-145, revealing a feedforward mechanism that 
increases the effect of RAS signalling”. Similarly, KRAS is a target for 
several miRNAs, of which the let-7 family is the most representative 
example”. The integration of miRNAs into key oncogenic pathways, 
and the generation of feedforward and feedback loops that have a bal- 
ancing effect, creates intricate ways to incorporate intracellular and 
extracellular signals in the decisions of cell proliferation or survival, 
and further implicates miRNAs in the pathogenesis of cancer. 


TP53 is a master regulator of miRNAs 

The TP53 tumour suppressor is perhaps the most important and 
well-studied cancer gene, and it is not surprising that several studies 
have suggested that miRNA biology can have a role in its regulation 
and activity (Fig. 2e). The p53 protein acts as a sequence-specific DNA- 
binding factor that can activate and repress transcription. Although 
there is no doubt that most of the actions of p53 can be explained by 
its ability to control canonical protein-coding targets such as CDKNIA 
and PUMA, it can also transactivate several miRNAs. One of the best- 
studied classes is the miR-34 family (Table 1), which represses genes 
that can promote proliferation and apoptosis — plausible targets ina 
p53-mediated tumour-suppressor response”. In principle, the action of 
p53 to induce the expression of miR-34 and other miRNAs can explain 
some of its transcriptional repressive functions. 

The discovery of additional p53-regulated miRNAs, and the targeting 
of p53 or its pathway by other miRNAs, has provided general insights 
into the miRNA-mediated control of gene expression and the poten- 
tial therapeutic opportunities for targeting the p53 network (Fig. 2e). 
Several p53-activated miRNAs, such as miR-192, miR-194, miR-215 
and miR-605, can target MDM2, which is a negative regulator of p53 
and a therapeutic target. These potentially relevant miRNAs can be 
epigenetically silenced in some types of cancer; however, their reac- 
tivation or reintroduction (see the section miRNAs as drugs and drug 
targets) offers an intriguing therapeutic opportunity for inhibiting 
MDM 2 in tumours that harbour wild-type p53 (refs 44, 45). Similarly, 
p53 can also activate miR-107, miR-200 or miR-192, which are miRNAs 
that inhibit angiogenesis and epithelial-to-mesenchymal transition”. 
Conversely, p53 can be repressed by certain oncogenic miRNAs includ- 
ing miR-380-5p, which is upregulated in neuroblastomas with MYCN 
amplification, or miR-504, which decreases p53-mediated apoptosis 
and cell-cycle arrest and can promote tumorigenesis”. However, the 
extent to which these miRNAs control life and death decisions in the 
p53 network still needs to be shown decisively to determine whether 
these miRNAs are valid therapeutic targets. 

The studies mentioned have extended our understanding of the roles 
and regulation of p53 into the world of small non-coding RNAs, but the 
action on miRNA biology may be even more complex. For example, 
one study” suggests that p53 can affect miRNA biogenesis by promot- 
ing pri-miRNA processing through association with the large Drosha 
complex (Fig. 2e), but the precise mechanism remains unclear”. In 
a more conventional way, the p53 family member p63 transcription- 
ally controls Dicerl expression. Mutant TP53 can interfere with this 
regulation, which leads to a reduction in Dicer] levels and reduces the 
levels of certain cancer-relevant miRNAs”. Thus, with the p53 network 
as a typical example, it is clear that miRNAs can interact with cancer- 
relevant pathways at multiple and unexpected levels and that a better 
understanding of miRNA biology will help to decipher the role and 
function of other important cancer genes. 


Micromanagement of metastasis and beyond 

In addition to promoting cancer initiation, miRNAs can modulate 
processes that support cancer progression, including metastasis” °°. As 
indicated earlier, changes in miRNA levels can occur through effects on 
their transcription or by global changes in the RNA interference (RNAi) 
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machinery, and both mechanisms seem to be important for this process. 
For example, in breast cancer, miR-10b and miR-9 can induce metas- 
tasis, whereas miR-126, miR-335 and miR-31 act as suppressors. The 
miR-200 family inhibits epithelial-to-mesenchymal transition, which 
influences one aspect of the metastatic process”. However, miR-200 
could also promote the colonization of metastatic cells in breast cancer, 
which provides yet another example of the opposing activities of some 
miRNAs™. Conversely, in head and neck squamous-cell carcinomas, lung 
adenocarcinomas and breast cancers, the reduced levels of certain miR- 
NAs that arise from Dicer1 downregulation also promote cell motility 
and are associated with enhanced metastasis in experimental models”. 

The pleiotropic effects of miRNA biology on cancer extend to 
virtually all acquired cancer traits, including cancer-associated changes 
in intracellular metabolism and the tissue microenvironment. For exam- 
ple, most cancer cells display alterations in glucose metabolism termed 
the Warburg effect. miRNAs may contribute to this metabolic switch 
because, in glioma cells, miR-451 controls cell proliferation, migration 
and responsiveness to glucose deprivation, thereby allowing the cells 
to survive metabolic stress’. The enhanced glutaminolysis observed 
in cancer cells can be partially explained by MYC-mediated repression 
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Figure 2 | Contribution of miRNAs to cancer pathways. a, Tumour- 
suppressor miRNAs, which repress oncogenes in healthy cells, are lost in 
cancer cells, leading to oncogene upregulation, whereas oncogenic miRNAs 
inhibit tumour-suppressor genes, giving rise to cancer. b, The presence 

of different target genes in different cell lines can modify the function of 

an miRNA, both in healthy cells and cancer cells, which can lead to the 
development of cancer or a different outcome. c, Two miRNAs can function 
together to regulate one or several pathways, which reinforces those pathways 
and can result in the development of cancer. d, The oncogene MYC can either 
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of miR-23a and miR-23b (ref. 62) (Fig. 2d). In some cases, the control 
of these cancer-related processes by miRNAs creates an opportunity 
for new therapeutic approaches. Hence, miR-132, which is present in 
the endothelium of tumours but not in normal human endothelium, 
induces neovascularization by inhibition of p120RasGAP, a negative 
regulator of KRAS™. The delivery of a miR-132 inhibitor with nano- 
particles that target the tumour vasculature suppresses angiogenesis 
in mice; this indicates there is a potential for the development of new 
antiangiogenic drugs. Further studies are likely to implicate miRNAs in 
the modulation of every tumour-associated pathway or trait. 


Big lessons from mice 

Much of what we have learnt concerning the functional contribution of 
miRNA biology to cancer development comes from studies in geneti- 
cally engineered mice. These systems provide powerful tools for the 
genetic and biological study of miRNAs in an in vivo context, which is 
particularly important given the contextual activity of most miRNAs. In 
addition, owing to the ability of these models to recapitulate the behav- 
iour of some human malignancies, they are useful in preclinical studies 
to evaluate new therapeutics. 
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repress tumour-suppressor miRNAs (in blue) or activate oncogenic miRNAs 
(in red) and can therefore orchestrate several different pathways. MYC can 
repress let-7, directly, or indirectly, through LIN28 activation. Conversely, 
let-7 can also repress MYC, which closes the regulatory circle. e, Tumour 
suppressor p53 can regulate several tumour suppressor miRNAs (blue), 
activating different antitumoral pathways. The regulation of MDM2 by some 
of these miRNAs leads to interesting feedforward loops. At the same time, 
p53 can be negatively regulated by oncogenic miRNAs (in red). In addition, 
p53 is involved in the biogenesis of several tumour suppressor miRNAs. 
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Perhaps the most widespread use of mice for characterizing miRNA 
biology in cancer is the validation of miRNAs that are altered in cancer 
cells, as bona fide oncogenes and tumour suppressor genes. As already 
mentioned, the first direct evidence that miRNAs have a function in 
cancer came from mouse models, in which it was shown that expres- 
sion of the miR-17-92 cluster — which is amplified in some human B 
cell lymphomas — cooperates with Myc to promote B-cell lymphoma 
in mice’. Subsequent studies that have used genetically engineered or 
transplantation-based systems identified the relevant miRNA com- 
ponents, showing that the miR-19 family (including miR-19a and 
miR-19b) represents the most potent oncogenes in this cluster. 
Another example is miR-155 overexpression in the lymphoid compart- 
ment, which triggers B-cell leukaemia or a myeloproliferative disorder 
depending on the system used to drive expression of the transgene; this 
was the first example of an miRNA that initiates cancer in a transgenic 
setting” (Table 1). 

Gene targeting has been used extensively to delete miRNAs for the 
purpose of characterizing their physiological roles or action as candi- 
date tumour suppressors. Gene targeting has suggested that miRNAs 
from similar families have redundant or compensatory functions, 
which has been shown for C. elegans”. Ablation of the miR-15a and 
miR-16-1 cluster, which is often deleted in human chronic lympho- 
cytic leukaemia, predisposes mice to B-cell lymphoproliferative dis- 
ease” (Table 1). Importantly, the ability to produce mouse strains with 
different gene dosage through heterozygous or homozygous gene dele- 
tions has revealed that Dicer 1, which if lost completely has a deleteri- 
ous effect, can promote malignant phenotypes as a haploinsufficient 
tumour suppressor”. Such a conclusion could not be formed from 
studies that examined only genomic data. 

Conditional gene expression systems in mice have allowed researchers 
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Figure 3 | In vivo miRNA expression or inhibition “la carte’. a, 
Tetracycline (Tet)-mediated miRNA inactivation or activation by doxycycline 
administration using Tet-OFF, in which a tissue-specific promoter (TSP) 

is combined with a transactivator (tTA) to turn on expression of oncogenic 
miRNA (purple) and induce tumorigenesis (purple star) and subsequent 
tumour regression, revealing dependence on the oncogenic miRNA, or 
Tet-ON systems in which a reverse transactivator (rtTA) switches on 
oncogenic miRNA when the drug is applied. Drug withdrawal leads to 
tumour regression. b, Tet-mediated miRNA activation or inactivation by 
doxycycline administration using Tet-OFF or Tet-ON systems. miRNAs 
(green) can be inhibited by miRNA sponges (dark blue), with the same effects 
as miRNA expression, leading to tumorigenesis and subsequent tumour 
regression, which indicates a dependence on tumour-suppressor loss. 
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to determine cancer gene dependencies, as well as whether genes that 
initiate cancer also participate in tumour maintenance. In many cases, 
withdrawal of the initiating oncogenic transgene (or restoration of the 
deleted or lost tumour suppressor) leads to the collapse of the tumour; 
this validates the transgene or pathway that is controlled by these genes, 
as a therapeutic target. Similar studies have also been applied to miR- 
NAs; for example, conditional expression of miR-21, which is broadly 
deregulated in cancer, can promote lymphomagenesis in mice” (Table 
1). Silencing of miR-21 leads to disease regression, in part, by promoting 
apoptosis” (Fig. 3a). Likewise, the use of miRNA inhibitors (for exam- 
ple, antagomirs) directed against miR-21 can inhibit the proliferation 
of human cancer cells that overexpress miR-21 (ref. 71). Together, these 
studies suggest that miR-21 antagonists have the potential to be effective 
therapies for at least some cancers. 

The development of new technology has meant that mouse mod- 
els are increasingly used to study gene function on a large if not 
genome-wide scale, and miRNAs are at the forefront of this revolu- 
tion. Recently, a vast collection of mouse embryonic stem-cell clones 
that harbour deletions that target 392 miRNA genes was generated”. 
This unique and valuable toolbox, termed ‘mirKO; will allow the 
creation of mice that lack specific miRNAs, express mutant miRNAs 
or the study of their expression. In a converse strategy, a collection 
of embryonic stem cells engineered to inducibly express the vast 
majority of known miRNAs is in production (S.W.L., Y. Park and 
G. Hannon, manuscript in preparation) and will allow the in vivo 
validation of miRNAs as oncogenes or as anticancer therapies. With 
a different strategy, miRNA sponges (Fig. 3b), which are oligonucleo- 
tide constructs with multiple complementary miRNA binding sites 
in tandem, have already been used to deplete individual miRNAs 
in transgenic fruitflies, in transplanted breast cancer cells in mice 
and in a transgenic mouse model**”*”*. Although these sponges pro- 
vide a scalable strategy for miRNA loss-of-function studies, more 
work is needed to rule out off-target effects and assess their potency 
before conclusions can be made. However, the availability of such 
resources will help with the functional study of miRNAs in normal 
development and disease, and will be useful to the wider scientific 
community. 

Finally, genetically engineered mouse models of human cancers are 
a testing ground for preclinical studies. For example, in Myc-induced 
liver tumours, miR-26 delivery by adeno-associated viruses suppresses 
tumorigenesis by inducing apoptosis’’. The increasing use of state-of- 
the-art mouse models is likely to uncover new in vivo functions, such 
as metastasis and angiogenesis, that otherwise would have remained 
hidden in vitro. They will also provide key preclinical systems for testing 
miRNA-based therapeutics. 


Constructing and deconstructing cancer 

The use of RNAi technology — a tool that exploits miRNA pathways — 
has revolutionized the study of gene function in mammalian systems 
and has provided a powerful means to investigate the function of any 
protein-coding gene. Experimental triggers of RNAi exploit different 
aspects of the pathway and result in the downregulation of gene expres- 
sion through incorporation into the miRNA biogenesis machinery at 
different points”’. Small-interfering RNAs (siRNAs), which function at 
the level of Dicer1, can transiently and potently lead to gene suppres- 
sion; these RNAi triggers, or their variants, are probably the structural 
‘scaffold’ for miRNA therapeutics (see the section miRNAs as drugs or 
drug targets). 

Stable RNAi can be activated by the expression of miRNA mimetics, 
that are either the so-called stem loop short-hairpin RNAs (shRNAs) or 
shRNAs that incorporate a larger miRNA fold. One example of the latter 
is based on miR-30 (known as miR-30-based shRNAs or ‘shRNAmirs’). 
These shRNAs, as occurs naturally for many miRNAs, can be embed- 
ded in non-coding sequences of protein-coding transcripts or linked 
in tandem, which allows, for example, the linkage of the shRNA with a 
fluorescent reporter or the simultaneous knockdown of two different 


© 2012 Macmillan Publishers Limited. All rights reserved 


genes’””’*, Advances in the shRNAmir methodology have allowed the 
development of versatile vectors for the study of proliferation and 
survival genes, strategies for optimizing the potency of shRNAs, and 
rapid and effective systems for conditional shRNA expression in mice” 
*\. The last of these, together with systems based on short stem-loop 
shRNAs™, could eventually allow the spatial, temporal and reversible 
control of any gene in vivo. 

Regardless of the platform, RNAi technology provides an effective 
tool to investigate cancer phenotypes and identify therapeutic targets. 
For example, RNAi has been used to identify and characterize tumour- 
suppressor genes, which if inhibited promote cancer development. 
Early studies, using the same system that validated miR-17-92 as an 
oncogene, demonstrated that inhibition of TP53 could produce phe- 
notypes that were consistent with TP53 loss**. Later studies showed 
that tumour suppressors could be identified prospectively using in vitro 
and in vivo shRNA screens, (for examples see refs 84 and 85). By con- 
ditionally expressing shRNAs that target tumour suppressors in mice, 
tumour-suppressor function in advanced tumours can be re-established 
by silencing the shRNA**. Tumour-suppressor reactivation leads to a 
marked (if not complete) tumour regression, which validates these path- 
ways as therapeutic targets. 

RNAi technology can be exploited more directly to identify genotype- 
specific cancer drug targets. Although there may be differences in the 
outcome of RNAi and small-molecule-mediated protein inhibition, 
siRNAs and shRNAs have been widely used to determine whether a 
candidate target is required for the proliferation of cancer cells. Moreo- 
ver, the availability of RNAi libraries that target portions of, or all, the 
human genome allows genetic screens to identify ‘synthetic lethal’ genes, 
for which, if combined, the attenuation triggers the death of the cell. In 
principle, the identification of an RNAi target, the inhibition of which 
is selectively lethal to cells harbouring a particular oncogenic alteration, 
should identify cancer-specific targets. Such approaches have identi- 
fied potential targets for KRAS-expressing tumours” *’ and leukaemias 
with deregulated MYC (ref. 90). Application of these approaches could 
potentially be complementary to the traditional drug-target discovery 
approach, and possibly a systematic way to identify the combination of 
therapies that will ultimately be needed to combat cancer. 


miRNAs as drugs and drug targets 

Despite advances in techniques to inhibit protein-coding genes using 
small molecules or biologicals, many cancers are unresponsive to the 
agents currently in use or become resistant to them; new and more 
creative approaches are therefore required for the treatment of cancer. 
Perhaps one of the most exciting opportunities that has arisen from 
our understanding of miRNA biology is the potential use of miRNA 
mimics or antagonists as therapeutics. Owing to the ability of miRNAs 
to simultaneously target multiple genes and pathways that are involved 
in cellular proliferation and survival”®, the targeting ofa single miRNA 
can be a form of ‘combination’ therapy that could obstruct feedback 
and compensatory mechanisms that would otherwise limit the effec- 
tiveness of many therapies in current use. In addition, because miRNA 
expression is often altered in cancer cells, agents that modulate miRNA 
activity could potentially produce cancer-specific effects'””’””. Based 
on this, anticancer therapies that inhibit or enhance miRNA activity 
are being developed (Fig. 4). Evidence for this is shown by the inhibi- 
tion of oncogenic miRNAs or the expression of tumour suppressor 
miRNAs in mice that harbour tumours, which have a significant effect 
on the outcome of cancer. Oncogenic miRNAs can be blocked by using 
antisense oligonucleotides, antagomirs, sponges or locked nucleic acid 
(LNA) constructs”’. The use of LNAs has achieved unexpected suc- 
cess in vivo, not only in mice but also for the treatment of hepatitis 
C in non-human primates”. The downregulation of miR-122 can 
lead to a significant inhibition of replication of the hepatitis C virus. 
This inhibition is thought to decrease the risk of chronic hepatitis 
and hepatocellular carcinoma in patients who are hepatitis C-posi- 
tive. Early clinical studies using SPC3649, an miR-122 antagonist, in 
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Figure 4 | Proposed scheme for the treatment of liver cancer with combined 
chemotherapy and miRNA-based therapy. a, miRNA expression profiles 

of potential patients could be assessed by measuring circulating miRNAs 

in patient serum or tumoral miRNAs from a biopsy. For example, miR-21 
expression and miR-26 loss could be detected in serum and tumour samples. 
b, This profile could be used for early detection of cancer, accurate diagnosis 
and prognosis, and choosing the best therapeutic strategy. The best available 
chemotherapeutic option could be combined with miRNA-based therapy. c, 
The oncomiRs detected in miRNA profiling and those present in the tumour, 
such as miR-21, could be inhibited by using different strategies, such as locked 
nucleic acid constructs. By contrast, the expression of tumour-suppressor 
miRNAs downregulated in the tumour could be restored and miR-26 levels 
could be increased with miRNA mimics. d, After treatment, the patient could 
be checked for relapse by periodically studying circulating miRNAs from 
serum in a non-invasive manner. The presence of miR-21 could indicate a 
potential relapse, and treatment would resume (black arrows). 


healthy individuals to assess toxicity will provide valuable information 
about pharmacokinetics and safety of the treatment. LNAs have been 
optimized to target miRNAs by reducing their molecular size and 
this, along with developing strategies for more efficient delivery, has 
increased their therapeutic potential”’. By contrast, another strategy 
involves the restoration of tumour-suppressor miRNA expression by 
synthetic miRNA mimics or viral delivery”’. Both of these approaches 
have yielded positive results in mouse models of cancer””’. Adeno- 
associated virus delivery of miRNAs or miRNA antagonists has the 
advantage of being efficient and, because the virus does not integrate 
into the genome, non-mutagenic. However, the delivery and safety 
of treatment needs to be improved before this approach can achieve 
widespread clinical use. 

In principle, the use of miRNA mimetics as therapeutics would allow 
‘drugging the undruggable’ or the therapeutic inhibition of virtually 
any human gene. If this were possible it would undoubtedly impact 
many diseases including cancer by allowing the targeting of oncogenic 
transcription factors that are difficult to inhibit through traditional 
medicinal chemistry”. Furthermore, owing to the similar chemistry 
that is used to create drugs that target diverse molecules, the imple- 
mentation of miRNA-based therapies could allow a more uniform drug 
development pipeline than is possible for more conventional treatments. 
Although experimental studies have validated the underlying biological 
impact of achieving miRNA modulation, there are still practical chal- 
lenges that prevent the use of miRNA mimetics and antagonists clini- 
cally, including uncharacterized off-target effects, toxicities and poor 
agent-delivery. Concerning the last of these, most miRNA mimetics 
and antagonists rely on the delivery of molecules that mimic or inhibit 
the ‘seed’ sequence of an miRNA (typically molecules that consist of 
26 nucleotides or related structures) across the plaama membrane — a 
particular challenge in the treatment of cancer, in which missing even a 
few cancer cells could lead to tumour relapse and progression. Extensive 
research is now focused on the viral and non-viral strategies required 
to meet this challenge, and results in the preclinical setting are promis- 


ing?"***, Despite the considerable hurdles that have to be overcome, it 
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seems likely that miRNAs will find a place alongside more conventional 
approaches for the treatment of cancer. 


Perspectives 

Since the discovery of miRNAs in model organisms, miRNAs have 
emerged as key regulators of normal development and a diversity of 
normal cellular processes. Given what we know now, it is not surpris- 
ing that perturbations in miRNA biogenesis or expression can con- 
tribute to disease. In cancer, the effects of miRNA alteration can be 
widespread and profound, and they touch on virtually all aspects of the 
malignant phenotype. Yet, precisely how miRNAs regulate the expres- 
sion of protein-coding genes is not completely understood, and the 
underlying mechanism remains an important basic-science question 
that will have a significant impact on our understanding of gene regu- 
lation and its alteration in disease. In addition, we still lack effective 
approaches to understand and predict miRNA targets. New strategies 
to identify and characterize the targets of individual miRNAs, and to 
determine how they function in combination to regulate specific targets, 
will be required to understand their action on cell physiology. Because 
miRNAs can also regulate other non-coding RNAs (for example, long 
non-coding RNAs), which have a role in cancer development and vice 
versa’, these interactions will increase the complexity of gene regulation 
and are likely to produce regulatory processes that are currently hidden. 
Pioneering knowledge, gained through the study of miRNA function 
and regulation, will undoubtedly provide methodological and theo- 
retical insights that will help in our understanding of the more recently 
identified non-coding RNA species. 

Understanding miRNA biology and how it contributes to cancer 
development is not only an academic exercise, but also provides an 
opportunity for the generation of new ideas for diagnosis and treatment. 
RNAi-based technology has allowed sophisticated loss-of-function 
experiments that were previously impossible and has revealed thera- 
peutic targets that, when inhibited, can lead to cancer cell elimination. 
In addition, miRNAs themselves are being used directly in the diagnosis 
of cancer and, in the future, will probably be exploited in therapy to 
identify drug targets or as the drug treatment. However, cost-effective 
miRNA profiling strategies and larger studies are needed to determine 
whether miRNA profiling provides an advantage for cancer classifica- 
tion compared with a more traditional approach. Although drugs that 
function as miRNA mimetics, antagonists or synthetic siRNAs form the 
core of what is fundamentally a new class of drugs that are capable of tar- 
geting molecules outside the range of traditional medicinal chemistry, 
their clinical implementation will require improvements in drug com- 
position and delivery; these challenges lie outside the scope of molecular 
biology and instead involve the fields of chemistry and nanotechnology. 
Nevertheless, the successful development of these technologies could 
ultimately translate our understanding of miRNA biology in cancer into 
strategies for the control of cancer. m 
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Eutrophication causes speciation reversal 
in whitefish adaptive radiations 


P. Vonlanthen’, D. Bittner?*, A. G. Hudson!?, K. A. Young”*, R. Miller’, B. Lundsgaard-Hansen!”, D. Roy”, S. Di Piazzal?, 


C. R. Largiader® & O. Seehausen!* 


Species diversity can be lost through two different but potentially interacting extinction processes: demographic decline 
and speciation reversal through introgressive hybridization. To investigate the relative contribution of these processes, 
we analysed historical and contemporary data of replicate whitefish radiations from 17 pre-alpine European lakes and 
reconstructed changes in genetic species differentiation through time using historical samples. Here we provide 
evidence that species diversity evolved in response to ecological opportunity, and that eutrophication, by 
diminishing this opportunity, has driven extinctions through speciation reversal and demographic decline. Across the 
radiations, the magnitude of eutrophication explains the pattern of species loss and levels of genetic and functional 
distinctiveness among remaining species. We argue that extinction by speciation reversal may be more widespread than 
currently appreciated. Preventing such extinctions will require that conservation efforts not only target existing species 
but identify and protect the ecological and evolutionary processes that generate and maintain species. 


Effectively counteracting the biodiversity crisis requires identifying 
and protecting the ecological and evolutionary processes that generate 
and maintain diversity’’. Species can go extinct through two distinct 
but potentially interacting processes. In the first, demographic decline 
results in population extirpation and eventually the total extinction 
of the species. In the second, introgressive hybridization erodes 
differentiation until species collapse into a hybrid swarm’. A special 
case of introgressive hybridization is speciation reversal’, in which 
changes in selection regimes increase gene flow between sympatric 
species, thus eroding genetic and ecological differences. Speciation 
reversal may be particularly important in adaptive radiations with 
recently diverged sympatric species that lack strong intrinsic post- 
zygotic isolation *. 

Adaptive radiation is the evolution of ecological diversity in rapidly 
speciating lineages”. It is often characterized by ‘ecological speciation’, 
in which traits that are under divergent natural selection, or those 
genetically correlated with them, contribute to reproductive isola- 
tion’ '’. When reproductive isolation between ecologically differen- 
tiated populations is maintained by the temporal and spatial 
clustering of breeding aggregations, adaptive radiation occurs 
through the correlated partitioning of ecological and reproductive 
niche spaces. Because intrinsic post-zygotic isolation is typically weak 
during adaptive radiation”, environmental changes that reduce niche 
space and relax the selective forces maintaining reproductive isola- 
tion'*” can lead to extinction by speciation reversal*®. 

Fish of post-glacial lakes are model systems for studying adaptive 
radiation owing to their recent origins and repeated patterns of 
diversification in independent lineages'*'*. These radiations are char- 
acterized by the correlated partitioning of ecological and reproductive 
niche spaces’*'*?°, In the European Alps, at least 25 lakes harbour 1 to 
5 whitefish species (Coregonus spp.)'*”' (Fig. la and Supplementary 
Table 1). For 17 of these lakes, 13 of which contain multiple sympatric 
species, the whitefish diversity was described by Steinmann 60 years 


ago”. This diversity has arisen since deglaciation within nine hydro- 
logically independent lake systems”. 

Reproductive isolation in central European whitefish radiations is 
maintained mainly by pre-zygotic mechanisms (divergence in spawn- 
ing depth”, time, possibly mate choice (B. Lundsgaard-Hansen et al. 
unpublished data) and extrinsic rather than intrinsic post-zygotic 
mechanisms™. Generally, large-bodied species with few, widely 
spaced gill-rakers (benthic invertebrate feeders), spawn in winter in 
shallow littoral habitats, whereas small-bodied species with many 
densely spaced rakers (zooplankton feeders), spawn in deeper water 
in winter or summer. Exceptions to this rule are profundal summer- 
spawning species with very low numbers of gill-rakers that exist in 
Lake Thun and existed in Lake Constance”. Summer-spawning 
species choose cold and well-oxygenated spawning habitats below 
the thermocline (>20 m in depth). Eggs settle onto the lake-floor 
sediment and require an oxygenated water-sediment interface to 
develop and hatch**. Because whitefish use most of the lacustrine 
habitats, and because of their large biomass and ecological diversity, 
they are keystone species in the ecosystems of pre-alpine lakes, which 
are commonly referred to as whitefish lakes. 

Although eutrophication threatens lake ecosystems worldwide**”’, 
the manner and mechanisms by which it has affected adaptive radia- 
tions, and whitefish in particular, remain unclear’***. Many Swiss 
whitefish lakes lie in densely populated areas and were subjected to 
high nutrient inputs in the twentieth century, a fact that led Steinmann 
to suggest in 1950 that eutrophication was the cause of the extinction of 
eight whitefish populations’. By the 1970s, eutrophication had 
increased primary production in all Swiss lakes (Fig. 2d and Sup- 
plementary Fig. 1). The associated increase in microbial decomposi- 
tion rates resulted in oxygen depletion at the water—-sediment interface, 
especially below the thermocline, leading to reduction or complete 
failure of whitefish recruitment”. Eutrophication also affected the 
biomass and diversity of zooplankton (Supplementary Fig. 2) and 
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Figure 1 | Distribution of historical whitefish diversity and recent diversity 
loss. a, Whitefish species diversity in Swiss lakes (numbered as in Table 1, fish 
are named in Supplementary Information; for details of taxonomy see 
quantification of whitefish diversities in Supplementary Information). 

b, Species richness change in 17 lakes. c, Functional diversity change in 16 lakes. 
probably of benthic invertebrates*”, thus altering the ecological and 
reproductive niche spaces that were associated with whitefish radia- 
tions. Improved sewage treatment and phosphorus management have 
allowed some lakes to return to near their natural trophic state (Fig. 2d). 
However, in other lakes, the sediment-—water interface remains anoxic 
and zooplankton biomass is higher than before eutrophication”’. 

We suspected that loss of deep spawning habitat weakened reproduct- 
ive isolation, and that at the same time, increased productivity led to an 
increase of zooplankton density at the expense of zooplankton diversity 
(Supplementary Fig. 2), whereas the associated hypoxia probably led to 
loss of zoobenthos density in the profundal zone*’. By disproportionately 
affecting the availability of one type of prey more than the other along the 
principal axis of whitefish feeding divergence, eutrophication probably 
changed the shape of the adaptive landscape from multimodal towards 
unimodal or flat, thus relaxing divergent selection. We therefore pro- 
posed that eutrophication caused speciation reversal in addition to 
demographic decline. We show that the speciation reversal hypothesis 
is supported by historical and contemporary patterns of diversity across 
lakes and by changes through time in genetic and phenotypic distinc- 
tiveness of sympatric species. 


Diversity loss in polluted lakes 

Most whitefish assemblages have lower species and functional diversity 
today than historically (Fig. 1, Table 1, Supplementary Table 1 and 
quantification of whitefish diversity in Supplementary Informa- 
tion). On average, species richness has decreased by 38% (Wilcoxon 
test N= 17, V= 91, P< 0.001), functional diversity (range in gill-raker 
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Red ellipses, more than 10% diversity loss; white ellipses, little or no change; 
blue ellipses, increase in diversity of more than 10%. The observed functional 
diversity increase in Lake Brienz is due to the presence of one species (C. sp. 
‘Balchen’) that Steinmann was unaware of”. 


numbers) by 14% (N = 16, V = 60, P = 0.018) and the difference 
between sympatric species in gill-raker mean counts by 28% (Welch’s 
t-test N = 8, t = 7.79, d.f. = 7, P< 0.001). Declines in species richness 
were explained by eutrophication level (linear regression N = 17, 
R? = 0.50, P < 0.001; Fig. 2a and Supplementary Table 2). Reduc- 
tions in gill-raker count range were poorly predicted by eutrophication, 
probably because some variation is retained in hybrid swarms and 
stocking programmes have maintained some diversity even in the most 
polluted lakes” (Table 1 and Supplementary Table 1). Eutrophication 
reduced the oxygenated depth (depth range with O, > 2.5 mg1 '; see 
Supplementary Information) across 16 lakes (Supplementary Fig. 3). 
Egg survival was measured in a subset of those lakes and was found to 
decrease with nutrient load (N = 12, R’ = 0.45, P = 0.010; Fig. 3f) and 
was close to zero once the maximum phosphorus exceeded 150 pg 17’. 


Predicting the origin and loss of diversity 

Because available depth affects the diversity of spawning and feeding 
habitats’, and because all lakes were oxygenated to the greatest 
depths before eutrophication, we expected maximum lake depth 
(Dax) to predict pre-eutrophication diversity. By contrast, we 
expected maximum phosphorus concentration (Pax) and minimum 
oxygenated depth (Do,min) during eutrophication to predict patterns 
of contemporary diversity (Table 1). 

Maximum lake depth does indeed predict historical species 
richness (N = 17, R’ = 0.48, P = 0.001, Fig. 3a) and the use of 
vulnerable reproductive niches. This pattern held when tested with 
evolutionarily independent lineages from hydrologically isolated lake 
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C. arenicolus versus C. wartmanni 
C. wartmanni versus C. macrophthalmus 
C. macrophthalmus versus C. arenicolus 
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Figure 2 | Diversity loss through speciation reversal. a, Species diversity loss 
regressed against the maximum phosphorus concentration, Pmax (ug 17’). 

b, The pairwise Fs7 values among three Coregonus species from Lake Constance 
observed through time. c, The global genetic differentiation among species 
within each lake plotted against the maximum phosphorus concentration. 

d, Fifty-year trends in phosphorus concentration from our study lakes are 
included. Lake Constance is highlighted as a blue gradient surface. Lake details 
are given in Supplementary Fig. 1. e, Ranges of species means in gill-raker 
counts for each lake, prior to (historical; 1, shown in blue) and after 
(contemporary; 2, shown in red) pollution. Lakes are arranged from weakly to 
strongly polluted. For panels a and c the dashed lines represent the 95% 
confidence intervals for the regression line. 
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groups as the unit of observation (N = 9, R? = 0.51, P 0.019). 
Historical functional diversity also increased with maximum lake 
depth (N = 17, R’ = 0.30, P = 0.013; Fig. 3b). Lakes that historically 
harboured summer- and deep-spawning species are significantly 
deeper than lakes that did not (Ngummer = 
p= 305, d£ = 1297, P = 0006: Nice, = 
df. = 14.36, P < 0.001). 

Oxygenated depth during eutrophication predicts contemporary 
species richness and functional diversity slightly better than does 
maximum lake depth, although the difference is not significant (dif- 
ference in Akaike’s corrected information criterion (AAICc) = 1.96 
and 1.91; see regression model selection in Supplementary 
Information) (species richness: N = 17, R? = 0.55, P < 0.001, 
Fig. 3c, versus R? = 0.49, P < 0.001; functional diversity: N = 16, 
R* = 0.40, P = 0.005, Fig. 3d, versus R* = 0.32, P = 0.013). This was 
also true for historical species richness and functional diversity, but 
oxygenated depth explained slightly more of the variance in contem- 
porary than in historical diversity (supplementary Table 2). Moreover, 
lakes that lost summer- or deep-spawning species were more 
eutrophied than those that retained these species (NsummerLoss = 35 
= 5:t = 3.04, df. = 5.99, P = 0.023; Npeeptoss = 3s 


8, NNosummer = 9: 


11, NNoDeep = 6: t = 5.05, 


N SummerNoLoss 
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NpeepNoLoss = 8: t = 2.98, df = 7.13, P = 0.020), whereas maximum 
depth was not different between these lakes (NgummerLoss = 35 
NeummerNotow = 5: t = 044, df, = 5.90, P = 0.675; Npseptoss = 35 
Noscsisinas = f= 0.301, d£ = 2.27, P= 0.783). 

Among lakes, the contemporary number of genetically differen- 
tiated species (see Methods) is best predicted by maximum depth 
(N = 8, R? = 0.50, P = 0.031; Fig. 3e). The level of genetic differenti- 
ation among species, on the other hand, is predicted by the severity of 
eutrophication, to which it is strongly negatively correlated (N = 8, 
R? = 0.83, P< 0.001; Fig. 2c). In combination with the previous results, 
these data suggest that the depth-mediated legacy of adaptive radiation 
has been modified by speciation reversal driven by eutrophication. 


Species loss through speciation reversal 


If extinction resulted from demographic decline, pairwise genetic 
differentiation among contemporary species at neutral markers (mea- 
sured using the fixation index (Fsr)) would remain unchanged or 
increase owing to genetic drift as effective population sizes declined”. 
By contrast, extinction by speciation reversal should involve declines 
in pairwise Fsy values among extant species®. Lake Constance 
suffered eutrophication, but phosphorus concentrations never 
exceeded 150 jig 1’ at which egg development fails even in shallow 
waters (Table 1). We extracted DNA from samples of all species 
collected before (1926-50, Pmax < 10 ug 1-'), during (1969-80, 
Pmax = 87 pg 1”') and after (1990-2004, Pmax = 39 ug i) peak 
eutrophication. Genetic cluster analysis identified four species, with 
all four being well represented in pre-eutrophication scale samples. 
Out of all of the post-eutrophication samples, only five individuals were 
assigned to the now extinct summer- and deep-spawning Coregonus 
gutturosus (Supplementary Table 3). However, the morphological (gill- 
raker counts) and reproductive (winter instead of summer spawning) 
traits of these individuals did not match those of historical C. gutturosus. 
We therefore calculated pairwise Fs; with and without these genetically 
assigned C. gutturosus-like individuals. Pairwise genetic differentiation 
among the three extant species has dropped dramatically through time 
(Fig. 2b) and global F;; has decreased over twofold (0.108/0.165 to 
0.046/0.047, without/with C. gutturosus, respectively). 

Speciation reversal should also increase genetic variation within 
extant species. Consistent with this prediction, allelic richness has 
increased through time in Coregonus wartmanni (N = 10, d.f. = 8, 
t = 3.38, P = 0.009) and a similar trend is seen in Coregonus 
macrophthalmus (N = 10, d.f. = 8, t = 2.17, P = 0.062; Sup- 
plementary Table 4). Out of 11 alleles found only in C. gutturosus 
among the pre-eutrophication samples (private alleles), 5 were found 
in contemporary Coregonus species of Lake Constance (Supplemen- 
tary Table 5). The probability of finding at least one of these alleles 
in pre-eutrophication samples of the other species, assuming similar 
frequencies, is 98% and suggests that the extinction of C. gutturosus 
involved hybridization with other species. 

Data from Lake Brienz, the lake that is least polluted and that has no 
loss in species or functional diversity (Table 1), contrast and com- 
plement those from Lake Constance. For the three endemic species, 
global genetic differentiation (global Fsr) was historically (1952-70) 
identical to that in Lake Constance (0.166) but has not declined until 
the present (0.183). Moreover, no significant increase in allelic rich- 
ness was observed in any of the three species. Nevertheless, out of 12 
historically private alleles of the summer- and deep-spawning 
Coregonus albellus, 7 were also found in contemporary samples of 
other species, suggesting that gene flow between species has also 
occurred in this lake (see also ref. 33). 

Additional support for speciation reversal comes from Lakes 
Ziirich and Walen, which share a single-origin species pair, a small, 
deep-spawning species (Coregonus heglingus) and a large, shallow- 
spawning species (Coregonus duplex'*; Supplementary Fig. 5). Despite 
a common evolutionary history, pairwise Fgy between the species 
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Table 1 | Whitefish species and functional diversity in 17 pre-alpine lakes. 


Species diversity and genetic differentiation Functional ecological diversity Mean Oxygenated Piya, Maximum 
egg lakedepth (ug!!) lake depth 
Lake (no.) Historical Contemporary Species Genetic | Changein Historical Contemporary Functional N N survival (m) (m) 
species species loss(%) species genetic gill-raker gill-rakerrange loss(%) Historical Contemporary (%) 
differentiation range gill-raker gill-raker range 
(%) range 
Lake Geneva (1) 2 O(1t) -—100 15 10* —33.30 61 24* 84.4 254.17 90 309 
Lake Neuchatel (2) 2 15 -25 1.5 17 17 0.00 ? 341 43.8 153.00 50 152 
Lake Murten (3) 2 ¢) —100 17 12* —29.40 2 30* 8.93 150 45.5 
Lake Biel (4) 2 2 0) 17 19 11.80 ? 49 179 27.50 132 74 
Lake Thun (5) 5 5 0 5 33 29 —12.10 471 331 67.2 214.00 21 217 
Lake Brienz (6) 3 3 Oo 3 10 18 20 11.10 >123 100 243.97 17 261 
Lake Sempach (7) 2 ¢) -—100 1 13 13* 0.00 >12 76* 0.7 8.26 165 87 
Lake Lucerne (8) 4 4 0 4 23 23 0.00 180 730 42.0 203.49 34 214 
Lake Zug (9) 2 1 —50 21 a —47.62 ? 20* 0.3 8.50 208 198 
Lake Baldegg (10) 0) —100 17 ? 434 517 66 
Lake Hallwil (11) ) —100 17 me —35.30 ? 20* 0.9 6.69 260 47 
Lake Ztirich (12) 3 2 -33 2 18 18 0.00 76 66 35.3 9.72 119 136 
Lake Walen (13) 3 2 =33° 2 19 20 5.30 ? 236 37.8 144.00 26 145 
Lake Greifen (14) ¢) —100 11 i 0.00 ? 50* 0.00 507 32.3 
Lake Pfaffiker (15) ¢) —100 15 Q* —40.00 ? 19* 0.00 367 35 
Lake Constance(16) 5 4 -20 3 —57(-71.5t) 36 26* —27.80 694 19% 314 248.91 87 254 
Lake Sarnen (17) 2 1 —50 13 10* —23.10 ? 20* 59.4 47.33 21 52 
Total/average 4 25.5 —38 —24 (-31+) —13.78 2,191 
he number of historically observed and presently observed phenotypically distinct, naturally recruiting whitefish species (Historical species** and Contemporary species, respectively. {The present wild 
jopulation observed in Lake Geneva does not correspond to either of the two described species); the percentage loss in species numbers (Species loss); the number of genetically distinct species observed today 


(Genetic species), in which 1.5 represents a species cline observed in Lake Neuchatel*?; the percentage reduction in global genetic differentiation; the historically and currently observed gill-raker count range 
(Historical gill-raker range and Contemporary gill-raker range, respectively); the functional diversity (Functional loss); the sample sizes for historical data (Historical gill-raker range, N) and contemporary gill-raker 
analysis (N Contemporary gill-raker range), the mean egg survival (Mean egg survival), the biologically available depth during eutrophication with more than 2.5 mg? dissolved oxygen (Oxygenated lake depth); 
the maximum total phosphorus concentration observed during the eutrophic period (Pmax); and the maximum lake depth (Maximum lake depth). 

*Gill-raker ranges adjusted for unequal sample sizes (see Methods, Supplementary Information, Supplementary Table 6 and Supplementary Fig. 4). 

+ Number in brackets corresponds to the loss if the phenotypically extinct C. gutturosus is included in the analysis (see Supplementary Table 3). 


today in eutrophic Lake Ziirich (0.041) is less than half of that in 
oligotrophic Lake Walen (0.110). 


Phenotypic signs of speciation reversal 

Speciation reversal is expected to erode interspecific phenotypic dis- 
tinctiveness*®. Gill-raker counts provide a measure of heritable 
phenotypic trophic adaptation’’, and the contemporary range in 
gill-raker number and total body shape disparity of individuals in a 
lake are correlated (N = 15, slope = 0.49; R? = 0.36, P < 0.011; 
Supplementary Fig. 6). Across lakes, the distances of species means 
from the historical midpoint of species means in a lake have become 
significantly smaller over time (N = 19, t = 2.56, d.f. = 18, P = 0.020). 
Extant species have converged in moderately and strongly polluted 
lakes (N = 10, t = 2.43, d.f. = 9, P = 0.038, Fig. 2e) but not in weakly 
and mildly polluted lakes (N = 9, t = 1.06, df. = 8, P = 0.319). 
Relative contemporary disparity (see phenotypic tests of speciation 
reversal in Supplementary Information) was significantly lower in 
moderately and strongly polluted lakes than in weakly and mildly 
polluted lakes (N = 5 (moderately and strongly polluted lakes), 
N = 6 (weakly and mildly polluted lakes), t = 2.48, df. = 9, P = 
0.035; Fig. 2e). The best general linear model contained maximum 
phosphorus concentration, maximum lake depth and oxygenated 
depth, with phosphorus having the largest and most significant effect 
(N = 10 lakes, R? = 0.85, P < 0.001, AAICc = 7.95; regression 
coefficient for phosphorus —0.79, P = 0.002). 

In all but two radiations, species with few gill-rakers spawn in 
shallow water, whereas species with many gill-rakers spawn deeper™”’. 
Speciation reversal predicts that the range in gill-raker number should 
contract from both ends of the distribution, whereas extinction 
through demographic decline of deep spawners predicts a contraction 
at the high end of the distribution. Consistent with speciation reversal, 
diversity has been lost from both ends of the distribution in each lake 
that experienced a range reduction (Table 1), independent of whether 
the two deep-spawning species with a low gill-raker count were 
included or not (mean loss at lower end is —3.4 or —2.7 gill rakers, 
respectively; Wilcoxon test: Z = —2.54, N = 8, d.f. = 7, P = 0.011 for 
both cases; mean loss at upper end is —2.75, Z = —2.54,N = 8,d.f. = 7, 
P = 0.011 for both cases). This result was robust to the removal of 
Lakes Murten, Hallwil and Pfaeffiker where natural recruitment had 
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ceased and stocks are maintained by stocking from hatcheries (mean 
loss at lower end is —3.4 or —2.4 gill rakers, respectively; Wilcoxon test, 
Z= —2.03,N=5,d.f. = 4, P = 0.042 for both cases; mean loss at upper 
end is —3; Z = —2.03, N = 5, d.f. = 4, P = 0.010 for both cases). 

Thus, in cases in which several species persisted in sympatry after 
eutrophication, their phenotypes converged, and the extinction of 
species was associated with evolution to intermediate phenotypes in 
the remaining species. This is consistent with partial and complete 
speciation reversal, respectively. 


Discussion 


Our evidence suggests anthropogenic eutrophication has led to spe- 
ciation reversal in whitefish radiations by increasing gene flow between 
previously ecologically differentiated species. Although divergent natural 
selection could in principle maintain species differences in the face of 
increased gene flow, eutrophication seems to have altered reproductive 
and ecological niche spaces to the degree that selection cannot counteract 
the homogenizing effects of gene flow. It is possible that accidental 
hybridization in hatcheries has contributed to interspecific gene flow. 
However, while reductions in genetic differentiation were related to 
eutrophication, hatcheries operate on all lakes. Thus, this alone cannot 
explain observed patterns of diversity loss. 

The study lakes have lost 38% of species diversity, 14% of functional 
diversity and 28% of functional disparity among species. At least eight 
endemic species and seven distinct populations of extant species have 
become extinct (Table 1 and Supplementary Table 1). Only 4 of 17 
lakes suffered no species loss. Among remaining species, genetic dif- 
ferentiation is reduced. This loss of species richness, phenotypic 
diversity and genetic differentiation occurred mainly unnoticed 
despite the commercial importance of whitefish. Similar large losses 
of whitefish diversity may have occurred in other lakes outside 
Switzerland (Supplementary Table 1) and the extinction of endemic 
char species pairs in some of the same lakes could have involved 
similar mechanisms”. Finally, we also note that similar patterns of 
diversity loss have been observed in several other taxa*®°?*>. 

Loss of biodiversity through speciation reversal may be under- 
appreciated for two reasons. First, the process can be difficult to detect 
because it does not require changes in distribution or abundance but 
can manifest through subtle changes in patterns of variation within 
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Figure 3 | Whitefish diversity explained by environmental variables. 

a, b, Historical whitefish diversity as species numbers (a) and the range in gill- 
raker numbers (b), plotted against maximum lake depth (Dmax). 

c, d, Contemporary diversity, measured as species numbers (c) and the range in 
gill-raker numbers (d), plotted against oxygenated depth (Do min). e, The 
number of contemporary genetically differentiated species plotted against 
maximum lake depth. f, The relationship between the maximum phosphorus 
concentration (Py,ax) and the viability of the whitefish eggs in 12 lakes. The 
dashed lines represent the 95% confidence intervals for the regression line. 


multi-species assemblages*. Second, speciation reversal is a potentially 
rapid process, by which species can collapse in just a few genera- 
tions**'*. Compelling tests of speciation reversal will often require 
historic samples with DNA of sufficient quality. Our results add to a 
growing body of evidence suggesting that freshwater fish radiations, but 
also terrestrial radiations'*"’, are threatened by anthropogenic activities 
that disrupt the ecological conditions and evolutionary processes that 
promote adaptive radiation*®**. There is evidence from lake eco- 
systems that eutrophication-mediated speciation reversal may threaten 
diversity simultaneously at interacting trophic levels**, and the effects 
on food webs require investigation. If the loss of ecologically dominant 
species, such as planktivorous fish, affects other ecosystem components, 
the impacts of speciation reversal may extend beyond the simple loss of 
species*””*. Regardless of the mechanistic details, preserving ecosystem 
services requires maintaining functional ecosystems, which in turn 
requires protecting the ecological conditions and evolutionary mechan- 
isms that generate and maintain species diversity**”*?”. 


METHODS SUMMARY 
Between 2004 and 2010 we collected 2,449 whitefish from 16 lakes. Muscle tissue 
was preserved in 100% ethanol for genetic analyses. The first gill arch was 
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removed from 2,191 individuals for gill-raker counts (Table 1). Scale samples 
were used to analyse historical trends in genetic differentiation of species in Lakes 
Constance (1926-50: N = 133; 1969-80: N = 92) and Brienz (1952-70: N = 66). 
We collected data on historical species richness in 17 lakes and on contemporary 
richness in 16 lakes. We determined three different metrics of historical diversity 
and four metrics of contemporary diversity for each lake assemblage: first, species 
richness, identified using morphology, spawning ecology and taxonomic literature; 
second, the observed range in gill-raker number, a measure of heritable functional 
diversity; third, genetic species differentiation, using genotypes based on ten micro- 
satellite loci (for methods see ref. 23); fourth, phenotypic distinctiveness of species 
using gill-raker mean counts. When possible, individual assignments to species was 
based on a Bayesian population inference algorithm (STRUCTURE version 2.3.3"; 
30,000 burn in and 300,000 Markov chain Monte Carlo steps). Environmental 
variables for each lake were obtained from the literature’ and government data- 
bases. Maximum phosphorus concentration corresponds to the highest value 
observed between 1951 and 2004. Oxygenated depth was the minimum depth range 
observed during the eutrophic phase with the water containing at least 2.5 mg! 
dissolved oxygen (see environmental variables in Supplementary Information). 
Whitefish eggs were collected from the lake bottom in twelve lakes on several 
samplings between 1968 and 2008, and the percentage of normally developing eggs 
was calculated*’. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Sampling. Between 2004 and 2010 we collected 2,449 whitefish from 16 lakes. We 
collected at least 20 individuals of each known species from 16 lakes, except 
C. heglingus in Lake Ziirich (17 individuals), and C. sp. ‘Felchen’ in Lake Thun, 
which could not be obtained. In most lakes, we collected fish directly on the 
spawning grounds. In six lakes (Lakes Sempach, Walen, Lucerne, Thun, Brienz 
and Neuchatel), fish were collected several times from many different spawning 
sites to distinguish intraspecific genetic population structure and species struc- 
ture. No geographical or temporal differences within species could be observed 
(ref. 23 and B.L.H., personal communication). We sampled systematically along 
water-depth gradients during the spawning period in Lakes Neuchatel’* and 
Lucerne (B.L.-H., P.V., A.G.H., K. Lucek and O.S., unpublished data). The length, 
weight and sex of every fish was recorded. Muscle tissue was removed and pre- 
served in 100% ethanol for DNA analysis. The first gill arch was removed from 
2,191 individuals for gill-raker counting (Table 1). Scale samples were used for 
molecular genetic analyses of historical trends in species differentiation in Lake 
Constance (1926-50: N = 133; 1969-80: N = 92) and Lake Brienz (1952-70: 
N = 66). 

Historical and contemporary diversity. We collected data on historical and 
contemporary diversity in 17 and 16 Swiss lakes, respectively. We determined 
three different metrics of historical and four of contemporary diversity (details in 
Supplementary Information). First, contemporary species richness was deter- 
mined using the same traits and procedures as Steinmann in 1950, who deter- 
mined historical species richness using morphological and meristic traits, and 
information on spawning ecology”; Second, contemporary ranges in gill-raker 
numbers were collected from our recent samples (see above) and historical gill- 
raker data were taken from Steinmann”. In whitefish, gill-raker number is related 
to feeding ecology“ and is highly heritable (0.79)"”, and thus provides an ecologically 
meaningful and taxonomically independent (Supplementary Fig. 7) estimate of 
heritable functional diversity. To enable comparisons between historical and con- 
temporary data when sample sizes were unequal or (for historical data) unknown, we 
used available data for each species to create normal distributions from which 100 
virtual individuals were then randomly sampled. Third, genetic species differenti- 
ation was determined by genotyping historical and contemporary samples at 10 
microsatellite loci. Details of laboratory methods that were used for contemporary 
samples are given in ref. 23. Whenever possible, the identification of sympatric 
genetically differentiated species and individual assignment were performed using 
the Bayesian population inference algorithm in STRUCTURE version 2.3.3 (ref. 41) 
(30,000 burn in and 300,000 MCMC steps). However, STRUCTURE is typically 
inefficient when Fs < 0.05 (unless many loci are sampled)’. This was found to be 
the case between some species in Lake Lucerne and in Lake Ziirich. A combination of 
morphology and spawning ecology was used to identify species in these lakes that 
was confirmed a posteriori by significant Fs; values observed between species 
sampled in sympatry. We calculated the extent of contemporary genetic differenti- 
ation among species in eight lakes, and the historical differentiation in two lakes; one 
that was moderately impacted (Lake Constance) and the other little impacted (Lake 
Brienz) by eutrophication. Fourth, phenotypic distinctiveness of species was deter- 
mined using gill-raker mean counts for each species in each lake. 

DNA extraction and PCR amplification of DNA from historical material. 
Total DNA was extracted from historical dried scales using a modified standard 
phenol-chloroform-ethanol extraction method**. The DNA quantity was measured 
using a Nanodrop ND-1000 (Nanodrop technologies) spectrophotometer and all 
samples containing less then 20 ng pl’ DNA were excluded from further investi- 
gations. Polymerase chain reaction (PCR) was performed according to the 
QIAGEN Multiplex standard protocol with an annealing temperature (Tn) of 
57 °C and 35 cycles (Sets 1 and 3) or 45 cycles (Sets 2 and 4). Denatured fragments 
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were resolved on an automated DNA sequencer (ABI 3100). Genotypes were 
determined with the GENEMAPPER 4.0 (ABI) software and checked visually. Each 
sample was amplified twice in a separate PCR. When both genotypes were identical 
we used these genotypes for further analysis (81.5% of all genotypes). When both 
genotypes were missing, no further attempt was taken to genotype a sample at that 
locus (8.9% of all genotypes). Finally, when only one of the two genotypes could be 
determined (9.6%), a third—or when needed, a fourth—separate PCR was per- 
formed to confirm genotypes. To estimate reproducibility, 28 samples were inde- 
pendently extracted and the procedure that is described above was repeated. 240 
genotypes were compared and 8 mismatches were found (reproducibility, 96.7%.). 
Only individuals with a minimum of six successfully genotyped loci were considered 
for population genetic data analysis. The level of missing data in loci with large 
fragment lengths was considerable in historical populations (40.5% for locus CoC]- 
61, 26.4% for locus CoCl-10 and 14.5% for locus CoC]-45). Separate analyses exclud- 
ing these loci yielded very similar results (data not shown). Therefore, all analyses 
were performed including all loci. 

Environmental variables. Lake depths (m) and maximum phosphorus content 
(Prot (mg m*)) data were obtained from ref. 42. O, depth profiles (mg 1-7} 
(Supplementary Table 7) were obtained from the Federal Office for the 
Environment (FOEN), Swiss Federal Institute of Aquatic Science and 
Technology (EAWAG) and the Internationale Gewasserschutzkomission fiir 
den Bodensee (IGKB). For maximum phosphorus concentration, we took the 
highest value that was observed in time series covering the period from 1951 to 
2004, which includes the onset and peak of the eutrophic phase and the 
re-oligotrophication that began in the 1980s. The maximum depth of a lake 
was the depth measured from the lake surface to the deepest point of the lake. 
Oxygenated depth was calculated as the minimum water depth range observed 
during the eutrophic phase with the water containing at least 2.5 mg!’ dissolved 
oxygen. The limit of 2.5 mg 1” ' was chosen to correspond to the critical oxygen 
level at which embryo development is negatively affected”. 

Egg survival data. Whitefish eggs were collected from the lake bottom in 12 lakes 
on several occasions between 1968 and 2008. Sampling was done in each lake 
between early January and early March, before the anticipated beginning of mass 
hatching of the corresponding whitefish species. As a comparative measure of egg 
viability, we calculated the percentage of eggs that developed normally. Details of 
the sampling methods can be found in ref. 43 and in Supplementary Information. 
Statistical analyses. We used least squares regressions and an information 
theoretic approach to select the models that best explain the relationship between 
predictor and response variables (Supplementary Information). Before compar- 
ing data, we tested for significant deviations from normal distributions using a 
Shapiro Wilks test. For data that did not significantly deviate from normality, a 
standard or paired Welch’s t-test was used. When the data significantly deviated 
from normality, a Wilcoxon signed rank or a Mann-Whitney U test was used. All 
tests were two-tailed. 
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Prions are a common mechanism for 
phenotypic inheritance in wild yeasts 


Randal Halfmann'?+*, Daniel F. Jarosz'*, Sandra K. Jones'}, Amelia Chang’?+, Alex K. Lancaster! & Susan Lindquist? 


The self-templating conformations of yeast prion proteins act as epigenetic elements of inheritance. Yeast prions might 
provide a mechanism for generating heritable phenotypic diversity that promotes survival in fluctuating environments 
and the evolution of new traits. However, this hypothesis is highly controversial. Prions that create new traits have not 
been found in wild strains, leading to the perception that they are rare ‘diseases’ of laboratory cultivation. Here we 
biochemically test approximately 700 wild strains of Saccharomyces for | PSI*] or [MOT3*], and find these prions in 
many. They conferred diverse phenotypes that were frequently beneficial under selective conditions. Simple meiotic 
re-assortment of the variation harboured within a strain readily fixed one such trait, making it robust and 
prion-independent. Finally, we genetically screened for unknown prion elements. Fully one-third of wild strains 
harboured them. These, too, created diverse, often beneficial phenotypes. Thus, prions broadly govern heritable 
traits in nature, in a manner that could profoundly expand adaptive opportunities. 


The heritable variation that drives new forms and functions is 
generally ascribed to mutations in the genetic code. We previously 
proposed an entirely different pathway for creating heritable phenotypic 
diversity’, through which the inheritance of new traits can precede the 
genetic changes that ultimately hardwire them. The mechanism for this 
seemingly paradoxical flow of information resides in epigenetic switches 
encoded entirely by self-perpetuating changes in protein structure, 
known as prions. 

The best studied prion is the yeast translation-termination factor 
Sup35. Like other prions, Sup35 carries a prion-determining domain 
(PrD) that is dispensable for the protein’s normal function. This PrD 
occasionally adopts an amyloid conformation. When it does, that 
amyloid perpetuates itself by templating the same conformation to 
the PrDs of other Sup35 molecules. This sequesters most Sup35 into 
insoluble fibres’. The ensuing reduction in translation-termination 
activity increases stop codon read-through, producing a variety of 
new traits that depend upon previously cryptic genetic variation. 

Just as the mitotic apparatus ensures inheritance of chromosomally 
determined traits, the prion-partitioning function provided by 
Hsp104 (refs 3, 4) ensures inheritance of prion phenotypes. Hsp104 
is a molecular machine that severs prion fibres, allowing replicating 
prion templates to be faithfully inherited by daughter cells. The prion 
formed by Sup35 is known as [PSI], brackets denoting its cytoplasmic 
inheritance and capital letters its dominant phenotypes. 

Cells expressing Sup35 in the non-prion [psi ] state spontaneously 
switch to [PSI*] at a frequency of about 1 in 10° (refs 5,6). We have 
proposed that [PSI™] provides a beneficial ‘bet-hedging’ mechanism 
to enhance survival in the face of fluctuating environments: by the 
time a yeast colony has reached appreciable size, a few [PSI™ ] cells will 
have appeared, expressing heritable new traits. If the trait is 
detrimental, only a few individuals in a large population will be lost. 
However, if it is advantageous, those few cells might ensure survival 
under conditions when the population would otherwise perish. 
[PSI*] is also lost sporadically. This guarantees that [psi ] cells 


will arise in [PSI*] colonies, providing a complementary survival 
advantage. 

A particularly attractive feature of this mechanism is that it provides 
immediate access to genetically complex traits’”. Regions downstream 
of stop codons frequently accumulate genetic variation. [PSI*]- 
mediated read-through allows this previously cryptic variation to have 
biological consequences at multiple loci simultaneously. The complex 
traits produced by this prion would be less likely to evolve if the indi- 
vidually contributing mutations had to be selected for as they arose. In 
the long run, reduced translational fidelity should be detrimental. 
However, advantageous phenotypes initially dependent on [PSI"] 
might be assimilated by various means’, allowing the prion to be lost 
and the trait maintained. 

Several lines of evidence support this hypothesis. First, mathematically, 
even an infrequent selective advantage for [PSI *] would be sufficient to 
maintain Sup35’s prion switching capacity*”. Second, the sequence of 
Sup35’s PrD is highly divergent but has retained, for at least 500 million 
years, two unusual features that regulate bi-stable inheritance of prion 
and non-prion phenotypes. An extreme amino acid bias in one seg- 
ment drives the PrD into a self-templating prion amyloid, whereas an 
immediately adjacent charged segment stabilizes it in the soluble non- 
prion state. Third, the rates at which cells switch into and out of the 
prion state increases when cells are not well-suited to their environ- 
ments and new phenotypes have a better chance of being beneficial’®. 
Increased switching is a direct consequence of the effects that diverse 
environmental stresses have on protein folding and homeostasis?” 
and also fulfils a critical theoretical prediction for such an evolvability 
function®”. 

In addition to Sup35, at least two dozen other proteins can form 
prions that are transmitted through the prion-partitioning activity of 
Hsp104 (refs 14, 15) in laboratory yeast. These prions are strikingly 
enriched in transcription factors and RNA-processing proteins that 
are well situated to transduce genetic variation into phenotypic effect. 
They too, therefore, might enable the inheritance of diverse biological 
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traits, enhancing survival in fluctuating environments. However, as 
attractive as such ideas may be’, and as intensely studied as yeast 
prions have been, their proposed adaptive value remains highly con- 
troversial'”’’. Indeed, prions are often categorized as rare ‘diseases’ of 
yeast or mere artefacts of laboratory cultivation. A key argument is 
that [PSI*] and other prions with phenotypic consequences have not 
been found in wild strains, despite attempts to find them. Here we 
establish the natural biological importance of prions through bio- 
chemical and biological analyses of hundreds of wild strains. 


[PSI*] occurs in wild strains 


To search for [PSI™] in wild strains we took advantage of the unusual 
stability of prion amyloid assemblies in ionic detergents, which 
enables their identification by semi-denaturing detergent-agarose 
gel electrophoresis (SDD-AGE)*. We modified the technique to 
enable high-throughput detection. Ultimately we analysed 690 yeast 
strains from diverse ecological niches (Supplementary Table 1). 
Amyloid polymers of Sup35 were present in ten (Fig. 1a, Supplemen- 
tary Table 1 and Supplementary Fig. 1). 

To ensure that these strains were not simply derived from a recent, 
prion-containing common ancestor, we sequenced the genomes of 
two. Over 25,000 single nucleotide polymorphisms established their 
independent origins (Supplementary Fig. 2). We also sequenced the 
SUP35 gene in several of the strains, which established that they, too, 
had independent origins (Supplementary Table 2). 

Do the Sup35 amyloids in these strains represent true prions? 
Owing to its central role in the inheritance of prion templates, even 
transient inhibition of Hsp104’s protein remodelling activity heritably 
‘cures’ cells of their prion elements. We inhibited Hsp104 function by 
growth on medium containing low concentrations of guanidine 
hydrochloride (GdHC1), which selectively inhibits its ATPase activity, 
and then plated cells back to media without GdHCI. In all cases this 
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Figure 1 | Identification and verification of prions in wild yeast. 

a, Representative SDD-AGE blot of wild yeast isolates probed with antibodies 
against Sup35 and Rnql. Amyloid polymers produce characteristic smears. 
SDD-AGE does not reliably detect monomeric proteins. b, Transient 
inactivation of Hsp104 by GdHCl or expression of a dominant negative mutant 
of Hsp104 eliminates these amyloids, indicating that they are [PST"] and 
[RNQ*] prions. 
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eliminated the amyloid (black arrows, Fig. 1b and Supplementary 
Fig. 1B). To ensure that curing was not due to an off-target effect of 
GdHCl we used a genetic approach —transiently expressing a 
dominant negative Hsp104 variant*, Hsp104?, on a plasmid marked 
with antibiotic resistance. (Wild strains contain no auxotrophies.) 
This also cured cells of the amyloid, confirming their prion-based 
inheritance (black arrows, Fig. 1b and Supplementary Fig. 1b, c). 


Laboratory culture is not responsible 


Might these prions have arisen simply as an artefact of laboratory 
culture conditions? In archiving wild strains, great care is taken to 
maintain their wild character (personal communication, L. Bisson). 
To determine directly if the conditions used might have inadvertently 
selected for the de novo appearance of [PSI* ], we compared growth of 
the archived strains with their cured derivatives on all of the relevant 
media. No growth advantage was found for [PSI *] on any of these 
media (yeast potato dextrose, YPD, YM broth, FM broth, wort agar or 
Wallerstein nutrient agar) in any of the strains, and it was sometimes 
detrimental (Supplementary Table 3). However, in strain 5672 [PSI +] 
produced an extreme selective advantage on synthetic grape must, a 
medium that recapitulates conditions of early wine fermentation”®. This 
suggests the prion was advantageous in the strain’s natural ecological 
niche (Supplementary Fig. 3). In any case, this and several other experi- 
ments (Supplementary Information), indicated that prions harboured 
by the wild strains almost certainly originated in the yeasts’ natural 
environments. 


Wild [PSI* | is associated with [RNQ* | 

In the laboratory, Sup35’s switch to the prion state strongly depends 
on the prion-enabling factor [RNQ™]?!. [RNQ*] is itself a prion 
formed by the Rnq1 protein’’. [RNQ*] is the only prion previously 
known to exist in wild strains'*?***. We screened our collection for 
[RNQ*] amyloids, finding them in 43 strains (Fig. 1b). These, too, 
depended on the prion-partitioning factor Hsp104 (arrows, Fig. 1b 
and Supplementary Fig. 1). The correlation between [RNQ*] and 
[PSI*] (P< 0.0001, Fisher’s exact test) was striking: all the [PSI™] 
strains contained [RNQ*]. This strongly indicates that [RNQ*] acts 
as a prion-inducing factor for [PSI *] in nature. 


[PSI* | transforms natural genetic variation 


Do wild prions generate phenotypic diversity from otherwise-cryptic 
natural variation? We compared growth of the wild [PSI*] strains 
with that of their cured derivatives in four carbon sources, under 
osmotic, oxidative, pH or ethanol stresses, and in the presence of 
antifungal drugs or DNA damaging agents. We also assessed their 
ability to invade the growth substratum. 

Prion curing produced many phenotypic changes that varied with 
the genetic background (Fig 2a). For example, strain UCD824, 
isolated from white wine, was resistant to acidic conditions and to 
fluconazole. UCD9339, isolated from Lambrusco grapes, was resistant 
to the DNA-damaging agent 4-nitroquinoline 1-oxide (4-NQO). 
These beneficial phenotypes were greatly reduced by prion curing. 
UCD978, isolated from Beaujolais wine, was sensitive to the oxidative 
stressor tBOOH and became more resistant on curing. This same 
strain normally penetrated the agar surface, but this ability was lost 
after prion curing (Fig. 3a). Thus, in UCD978 the prion produced 
a trade-off, creating traits that were potentially detrimental or 
beneficial, depending on the circumstances. 

GdHCl and Hsp104° cures cells of other Hsp104-dependent 
prions in addition to [PSI*]!*°. To determine if such curable 
phenotypes arose from [PSI *) itself, we transformed the ten strains with 
a plasmid expressing a Sup35 variant lacking the PrD (Sup35APrD). 
This protein is immune to [PSI*]-mediated sequestration and restores 
— translation termination in [PSI‘] cells without altering other 
prions’*. In most cases Sup35APrD produced the same changes as cur- 
ing with GdHCl and Hsp104?% (Fig. 2b, Fig. 3b, see Supplementary 
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Figure 2 | Prion-contingent phenotypes of [PSI*] isolates. a, Wild [PSI™] 
strains show diverse phenotypes that are eliminated by transient Hsp104 
inhibition. Growth curves for wild strains and cured derivatives in the indicated 
selective condition are in closed blue circles and open red circles, respectively. 
Growth in YPD is presented for each wild strain (closed grey circles) and its 
cured derivative (open grey circles) as a control. ODgoo; optical density measured 
at 600 nm. b, Phenotypes of the wild [PSI*] strains were also eliminated by 
expression of Sup35APrD. Growth curves for wild strains expressing Sup35 or 
Sup35APrD in the indicated selective condition are in closed blue circles and 
open red circles, respectively. Growth in YPD for the indicated wild strain 
expressing Sup35 (closed grey circles) or Sup35APrD (open grey circles) is 
presented as a control. Error bars are present on all points and represent the 
standard deviation from four independent biological replicates. 


Information for discussion). Thus, most of the original traits were due 
to [PSI”]. 


Fixation of a [PSI*|-dependent phenotype 
When laboratory strains of diverse backgrounds are crossed and 
sporulated, meiotic re-assortment of the genetic variants they contain 
can lead to the fixation of a prion trait’’. That is, whereas the trait 
initially depends on the prion, it can become prion-independent. 
Might this mechanism allow wild strains to drive prion-dependent traits 
to fixation? Wild yeasts frequently harbour considerable heterozygosity, 
and sequencing has revealed that the [PSI*] strain UCD978 was highly 
polymorphic. We asked if simple re-assortment of these polymorphisms 
could fix a [PSI]-dependent trait. 

Thirty random haploid progeny of UCD978 were tested for agar 
adhesion before and after curing. Five retained [PSI™ ]-dependent 
adhesion; twenty were no longer adhesive, with or without [PSI™]; 
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Figure 3 | Genetic assimilation of the [PSI* ]-dependent adhesive 
phenotype in meiotic progeny of UCD978. a, [PSI™] allows the wine yeast 
UCD978 to adhere to agar surfaces. a, b, Adhesion is eliminated by GdHC] (a) or 
by expression of a non-aggregating version of Sup35, Sup35APrD (b). c, Meiotic 
progeny of strain UCD978 show a diversity of phenotypes. In some spores the 
adhesive phenotype was assimilated and remained even after the [PSI™] prion 
was cured. In others the adhesive phenotype retained [PSI*] dependence. 
Finally, some lost the phenotype altogether, irrespective of [PSI] status. 


five remained adhesive even after [PSI*] curing (Fig. 3c). Given the 
frequency of fixation, it probably required the re-assortment of a few 
different polymorphisms. But clearly, the naturally occurring genetic 
variation present in this strain was alone sufficient to fix this trait. 


[MOT3*] occurs in wild strains 

Nearly two-dozen proteins with prion-forming capacity have recently 
been discovered in yeast (reviewed in ref. 15). Serendipitously, an 
endogenous hexahistidine motif in one, the transcriptional repressor 
Mot3, permits detection on SDD-AGE immunoblots". Sixteen yeast 
proteins contain a hexahistidine motif, but only Mot3 has a prion-like 
sequence'*. We found [MOT3*] amyloids in six of the 96 diverse 
strains we tested (Supplementary Table 1). 

To determine if wild [MOT3" ] prions produced potentially adaptive 
phenotypes, we first took advantage of Mot3’s known function as a 
transcriptional repressor of genes involved in cell wall production. 
We tested wild [MOT3" ] strains for resistance to the cell wall toxin 
calcofluor white. Strain Y-35, isolated from holly berries, was highly 
resistant to calcofluor. Resistance was heritably reduced by GdHCl 
treatment, and this treatment also eliminated [MOT3* ] amyloids 
(Fig. 4a). 

Asa transcriptional repressor, when Mot3 switches into or out of its 
prion form it has the potential to broadly transform information flow. 
We next screened wild [MOT3*] strains and their cured derivatives 
against the same growth conditions used for the wild [PSI *7 strains. 
Many phenotypes were altered by prion curing. For example, 
[MOT3*] NCYC 3311, a Finnish soil isolate, was resistant to acidic 
conditions. [MOT3* ] Y-1537, isolated from grape must, was resistant 
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to fluconazole. Both traits were virtually eliminated by curing with 
GdHCl (Fig. 4b). 

To determine if the traits were [MOT3* ]-dependent, we expressed 
a Mot3 protein lacking the PrD (Mot3APrD) that is immune to prion 
sequestration but retains normal transcriptional function (R.H., 
personal communication). Analogous to Sup35APrD, this eliminates 
[MOT3*] phenotypes without affecting other prions. NCYC 3311 
lost acid resistance and Y-1537 lost fluconazole resistance with this 
plasmid, but not with plasmids expressing the full-length protein 
(Fig. 4b). These phenotypes were, therefore, [MOT3* ]-dependent. 
More broadly, the divergent consequences of this prion in different 
strains establish that, like [PSI*], [MOT3*] provides phenotypic 
diversity by altering the manifestation of natural genetic variation. 


Wild strains harbour additional prions 


How commonly do wild strains harbour prions that can create such 
heritable phenotypic diversity? Lacking means of detecting them by 
SDD-AGE, we used a phenotypic approach: we measured the growth 
of wild strains before and after GdHCl curing, across the same con- 
ditions used for [PSI*] and [MOT3*] (Supplementary Fig. 4). To 
ensure that any such phenotypes did not arise from de novo muta- 
tions, we compared four colonies of each wild strain with four cured 
derivatives (in total testing 5,520 isolates across 12 conditions). 
Remarkably, over a third of the original wild strains (255) had 
phenotypes that differed in the same way between all four parental 
wild strains and all four cured derivatives. Moreover, nearly half of the 
growth properties conferred by these GdHCl-curable heritable ele- 
ments were beneficial (Supplementary Table 1). The wild strain col- 
lection was biased towards wine isolates derived from natural 
fermentations. But it also contained many samples from beer, soil, 
fruit, infected human patients and commercial strains recently subject 
to man-made selective pressures to enhance properties for baking and 
brewing. Curable phenotypes, both beneficial and detrimental, 
occurred in yeasts from all of these niches. Even among the limited 
number of conditions tested here, prion curing had mixed phenotypic 
consequences in 15% of the strains. Thus, like [PSI *] and [MOT3° J, 
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Figure 4 | Prions of the cell wall-remodelling transcription factor, Mot3, 
have diverse phenotypic consequences in wild strains. a, Strain Y-35 is 
resistant to the cell-wall-targeting drug, calcofluor white, but resistance is 
strongly reduced by passage on GdHCL. Probing for Mot3's endogenous 
hexahistidine motif reveals that Mot3 amyloids are retained in the uncured, but 
not the cured isolates. Apparent monomeric signal results from cross-reactivity 
to other yeast proteins. b, The [MOT3* | strains NCYC 3311 and Y-1537 (each 
shown in filled blue circles) are resistant to acidic growth conditions and 
fluconazole, respectively. Each phenotype is reversed by prion curing with 
GdHCl passage (open red circles). c, These phenotypes are also reversed by 
expression of a non-aggregating version of Mot3, Mot3APrD (open red circles). 
Expression of Mot3 itself (closed blue circles) did not affect the phenotype of 
either strain. Error bars represent the standard deviation of four independent 
biological replicates. 
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these prions created different trade-offs — traits that were beneficial 
or detrimental depending on circumstances. 

To test whether the altered phenotypes arose from prion-mediated 
protein templating or from some unknown (yet somehow heritable 
and reproducible) effect of transient GdHCI exposure, we investigated 
25 randomly chosen strains more rigorously. Transient expression of 
Hsp104?" phenocopied the effects of GdHCI curing in 22 of the 25 
strains (Supplementary Table 1), establishing their dependence on 
this prion-partitioning factor. 

Another signature of prions is transmission to other cells via 
cytoplasmic transfer (cytoduction). Do the curable phenotypic 
elements of wild yeasts share this property? Because of the complexities 
in working with wild yeast, we used a derivative of W303, a common 
laboratory strain, as a universal ‘recipient’ for cytoplasmic transfer. 
Because prion-dependent traits can differ with genetic background, 
we chose wild strains with multiple curable traits as donors for these 
unknown prions, to increase the likelihood that transfer could be 
scored phenotypically. The South African wine strain WE372 had 
two traits that were heritably lost by prion curing (Fig. 5a, arrows): 
unusually robust growth on rich medium and poor growth at pH 9. 
The clinical isolate YJM428 had three such traits (Fig. 5a, arrows): 
robust growth in sodium chloride and 4-NQO, but poor growth on 
maltose. 

After crossing donor and recipient strains to produce heterokaryons, 
we selected buds that bore only the nucleus of the W303 recipient but 
had received cytoplasm from the wild donor (Fig. 5b top; see Methods 
for details). We tested 12 such cytoductants from each mating to 
determine if they had inherited stable new traits from the cytoplasmic 
transfer. 

Poor growth at pH 9 was not transferred, but robust growth on rich 
medium was transferred from WE372 donors to all 12 W303 recipients 
(Fig. 5c, red arrows). NaCl resistance was not transferred, but both 
enhanced growth on 4-NQO and poor growth on YP-maltose was 
transferred from YJM428 donors to all 12 W303 recipients (Fig. 5c, 
red arrows). The fidelity of the transferred traits established that they 
were not due to rare chromosome transfers that can occur in such 
crosses. The lack of transfer for some traits suggests that, as for 
[PSI*] and [MOT3*], traits produced by these unknown heritable 
cytoplasmic factors depend upon the genetic background. All trans- 
ferred traits were curable by passage on GdHCI (Fig. 5c, black arrows), 
strongly indicating they were due to prions. Excluding the possibility 
that these traits were due to mitochondrial DNA transfer, we repeated 
the entire experiment with WE372 and YJM428 donors that had been 
cured of prions before heterokaryon formation (Fig. 5b, bottom). None 
acquired the new phenotypes (Fig. 5d). 


Prions alter the relationship between genotype and 
phenotype in wild strains 

How significantly do the prions of wild yeasts alter the phenotypic 
manifestation of genetic diversity? We examined the relationship 
between genotype and phenotype in the 21 strains in our collection 
whose genomes had been fully sequenced. As previously reported”*”®, 
the Spearman’s correlation (rho) between genotypes and phenotypes 
is typically on the order of 0.3 to 0.4 for wild yeast (in our strains 
and conditions, rho = 0.39; P= 3.5 X 10 '°). Prion loss made the 
correlation between genotype and phenotype weaker (Spearman’s 
rho = 0.27; P= 1.5 X 10°”). This finding was robust to random per- 
mutations of the data (P = 0.0001) and was clear even when [PSI* ] 
and [MOT3"] strains were removed from the analysis. Thus, the 
prions these wild strains harbour broadly interface with polymorphisms 
in their genomes to influence the relationship between genotype and 


phenotype. 


Discussion 


The stable inheritance and complex phenotypes that prions produce 
arise from changes in protein conformation rather than from changes 
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Figure 5 | The curable Hsp104-dependent epigenetic elements in wild yeast 
can be cytoplasmically transferred. a, Growth of wild strains WE372, 
YJM428, and their cured derivatives in selective conditions. Original isolates of 
each strain are shown in grey bars and their cured derivatives (indicated by 
black arrows) are shown in open bars. Error bars are one standard deviation 
from six biological replicates. b, Schematic for cytoduction experiments and 
control to ensure that phenotypes are due to prions, rather than transfer of 
mitochondrial DNA. c, Growth measurements of the laboratory recipient (blue 
bars), cytoductants (grey bars; red arrows denote cytoplasmic transfer of 
phenotypes), and cured derivatives of those cytoductants (open bars; black 
arrows denote curability of phenotypes). Error bars are the standard deviation 
of growth measurements from 12 cytoductants, the 12 cured derivatives of 
those cytoductants, or 6 biological replicates of the recipient strain. 

d, Cytoductants that received cytoplasm from cured derivatives of the original 
wild isolates (hashed bars) did not show an equivalent change in phenotype. 
Error bars represent the standard deviation of growth measurements from 12 
cytoductants or 6 biological replicates of the recipient strain. 


in nucleic acids. This non-canonical mode of inheritance has sparked 
considerable excitement and provoked intense mechanistic study (for 
review see ref, 27). But doubts about whether [PSI* ] and other prions 
exist in nature have fuelled deep controversy over their relevance’’”*. 
We find that prions and prion-dependent phenotypes are widespread 
in nature, establishing their biological importance. 

Saccharomyces cerevisiae is perhaps the most thoroughly charac- 
terized organism in experimental science. How, then, could this per- 
vasive influence on the inheritance of biological traits have been 
missed for so long? The frequency of [PSI*] in wild strains suggests 
that previous efforts to find it simply did not examine enough strains 
(see Supplementary Information for further discussion). But we 
suspect that standard practices in yeast genetics provide a far more 
general explanation. Phenotypic analysis of new traits typically begins 
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by testing for 2:2 segregation in crosses and discarding variants that do 
not follow this Mendelian pattern. It is equally common to discard 
variants that prove to be restricted to individual strains. Thus, prion- 
based phenotypes may largely have been ignored because investi- 
gations were strongly biased by the known rules of nucleic-acid based 
inheritance and because of a pragmatism that neglected the biological 
significance of strain-to-strain variation. 

We find that in wild strains a prion-dependent trait can readily be 
fixed by the meiotic re-assortment of endogenous genetic variation. 
(They can also probably be fixed by new mutations, a phenomenon 
not yet explored.) Thus, prions provide a robust mechanism by which 
yeast can increase their phenotypic diversity epigenetically, in a 
manner that readily allows that diversity to become hard-wired in 
subsequent generations. Evidence for an uncannily similar trans- 
ition — from an epigenetically inherited trait to a genetically hard- 
wired trait—has recently surfaced for drug resistance in cancer 
cells”*°. The underlying epigenetic mechanism in that case is 
chromatin-based rather than protein-based. But together, these 
observations point to a new view of the importance of epigenetics 
in evolutionary processes. 

Under stressful conditions, cells increase the rate at which they 
switch into and out of the [PSI] state!°. This link between heritable 
phenotypic diversity and environmental contingency is a natural con- 
sequence of stress-induced disruptions in protein homeostasis*’™*. 
Such stress-regulated factors interface with protein homeostasis to 
drive increased prion switching in several ways (see Supplementary 
Information for discussion). Beyond these more general homeostatic 
mechanisms, it seems likely that conformational switches in individual 
prion proteins will prove to be regulated by additional protein-specific 
interactions and post-translational protein modifications. 

Surprisingly, 40% of the traits produced by the wild prions we 
observed were beneficial to growth under the 12 conditions we tested. 
This contrasts with the consequences of mutational variants, the vast 
majority of which are either silent or detrimental. It seems probable, 
therefore, that the underlying, cryptic variation that creates prion- 
based traits in wild yeast, as well as the prions themselves, have been 
subject to previous selective events”’. In any case, the gain and loss of 
prions seems to constitute a sophisticated bet-hedging mechanism 
that allows cells to explore heritable new phenotypes more frequently 
in circumstances where they are not well suited to their environments. 

Most of the 25 prions discovered to date are RNA-binding proteins, 
DNA-binding proteins and signal transducers — proteins that play 
key roles in governing the flow of information in cellular networks. 
Prion-mediated alterations in such functions allow access to complex 
traits in a single step. Together, the interface between prions, environ- 
mental stress, cryptic genetic variation and fixation provide a means 
to transition from a Lamarckian** mode of inheritance to a Darwinian 
framework of mutation and natural selection. As for Hsp90 (ref. 26), 
prions provide a robust route to the inheritance of environmentally 
induced traits that has moved from being merely plausible to dem- 
onstrable. 


METHODS SUMMARY 

SDD-AGE. Cells were grown for 18-24h in 1 ml YPD followed by lysis in 96-well 
blocks. Lysates were treated with 2% SDS at room temperature and then clarified 
by centrifugation. Electrophoresis and blotting were performed as described*’. 
Molecular cloning and yeast techniques. Standard cloning procedures’* were 
used for the construction of yeast plasmids marked with drug resistance cassettes. 
Yeast strains were obtained from the sources indicated in the Supplementary 
Information. Yeast handling, propagation and prion curing were as described’. 
Cytoductions used donor strains created by disrupting the HO locus with an 
antibiotic-resistance marker™ and then sporulating. The recipient strain was a 
respiration deficient KAR1-15 derivative of W303. High throughput phenotyping 
and data analyses were performed as described”*. 

Sequence analysis of wild strains. Strains UCD978 and 5672 were sequenced to 
100-fold coverage using the Illumina HiSeq platform. After quality control filter- 
ing, each genome was aligned against the S. cerevisiae reference sequence using 
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the BWA aligner”. Correlation between similarity in genotype and similarity in 
phenotype was calculated as in ref. 26. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Sequencing of wild strains. We sequenced two wild [PSI™] strains (UCD978 and 
5672), a clinical isolate (YJM653), a wine strain (114) and two lab isolates (W303 and 
YDJ25, a strain almost completely isogenic to the laboratory reference $288C strain). 
Using an Illumina HiSeq platform we obtained two lanes of 76 base pair paired-end 
reads and one lane of 101 base pair paired-end reads, resulting in an average coverage 
of 100-fold (all reads will be available at NCBI under accession number 
PRJNA81619). After quality control filtering, each genome was aligned against 
the S. cerevisiae reference sequence (sacCer2, June 2008 assembly, downloaded 
from UCSC on April 1, 2011: http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/ 
chromosomes/) using the BWA aligner” followed by SNP and indel calling with 
respect to the reference using mpileup from the samtools*® package. 

We then estimated the genetic distance between each unique pair of genomes 
using a combination of custom code and the Genome Analysis Toolkit (GATK)"’. 
For each genome pair we considered the superset of SNPs in both genomes: if a 
SNP was present only in one genome, but not the other, it was retained. SNPs that 
were present in both genomes were discarded. We constructed a neighbour- 
joining tree using the APE R package” by using the genetic distance matrix from 
the counts of retained SNPs with quality score of at least 150 as called by mpileup. 
The horizontal scale bar in Supplementary Fig. 2 represents a genetic distance of 
10,000 SNPs. To ensure that the tree was not sensitive to choice of the SNP quality 
cutoff, we performed the analysis at different quality thresholds and obtained 
essentially identical results. 

We also separately sequenced the prion-determining region of SUP35. Only 
one had a non-synonymous change in the PrD, and this change was unlikely to 
influence Sup35’s inherent prion propensity. All polymorphisms in the region 
spanning nucleotides —338 to 1102 of SUP35 are indicated in Supplementary 
Table 2. 

Molecular cloning. Standard cloning procedures were used’*. Gateway destina- 
tion vectors were constructed as follows. The URA3 ORF in pAG416-GPD-ccdB, 
pAG416-TEF-ccdB and pAG416-SUP35-ccdB'** was replaced with cassettes 
conferring G418 or hygromycin B resistance from plasmids pUG6 and pAG32, 
respectively, to generate pAG41NEO-GPD-ccdB, pAG41NEO-SUP35-ccdB and 
pAG41HPH-TEF-ccdB. These plasmids contain the GPD1, SUP35 and TEF2 
promoters, respectively, for driving the expression of exogenous genes. 
Gateway entry clones bearing the coding sequences for SUP35, SUP35APrD, 
HSP104, HSP1048'S" and MOT3 were constructed as described"! using 
S288C genomic DNA and a plasmid bearing HSP104 K218T K620T (ref. 3) as 
PCR templates. Site-directed mutagenesis was used to delete the PrD (amino 
acids 8-157) from the MOT3 entry clone (R.H., personal communication). 
Entry clones and destination vectors were recombined in Gateway reactions to 
yield pAG41NEO-GPD-HSP 104, pAG41NEO-GPD-HSP 104! “", pAG41NEO- 
SUP35-SUP35, pAG41NEO-SUP35-SUP35APrD, pAG41HPH-TEF-MOT3 and 
pAG41HPH-TEF-MOT3APrD. 

Yeast techniques. Yeast strains (Supplementary Table 1) were obtained from 
stock centres or generously provided by the sources indicated. All strains were 
stored as glycerol stocks at — 80 °C and revived on YPD before testing. Strains that 
were [PSI”] in the original screen were re-ordered individually from the 
Department of Viticulture and Enology collection (University of California 
Davis) and the prion status was verified on a second SDD-AGE. Yeast were 
grown in YPD at 30 °C unless indicated otherwise. The following media supple- 
ments were included where relevant: 3 mM GdHCl, 200 ppg ml ~ 1 G418, or 250 Lg 
ml! hygromycin B. Yeast were transformed with a standard lithium-acetate 
protocol as described”. 

To eliminate prions chemically, strains were passaged four times on rich 
medium containing 3mM GdHCl. To eliminate prions by overexpression of 
Hsp104, cells were transformed with plasmids expressing Hsp104 (wild type or 
K218T, K620T) froma strong constitutive promoter (GPD). Transformants were 
passaged three times on selective media, followed by four passages on YPD to 
allow plasmid loss, which was confirmed by the absence of growth on selective 
media. Finally, as GdHCl is known to increase the frequencies of petites, all 
GdHCl- and Hsp104?-cured isolates used in the phenotyping experiments for 
[PSI+], [MOT3+], and the analyses of 25 strains containing unknown prions, 
were first checked for respiration competence on glycerol. Cured and pre-cured 
isolates grew comparably well on glycerol. 

A PCR-based deletion strategy** was used to replace one genomic copy of 
URA3 in strain UCD978 with an hphMX4 module from pAG32. The resulting 
strain was then sporulated. Random sporulants recovered on YPD containing 
hygromycin B were then sporulated again. Hygromycin-resistant sporulants 
from the second round of sporulation were tested for ploidy by growth on media 
containing 5-fluoroorotic acid (5-FOA), inability to grow on media lacking uracil, 
and ability to mate with haploid tester strains. Mating type was observed to be 
stable, indicating that UCD978 is heterothallic. Genomic DNA content was 
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determined by SYTOX Green staining as described**, using a BD LSR II flow 
cytometer. BY4741 and BY4743 were used as haploid and diploid references, 
respectively. 

For cytoduction experiments, we used a derivative of the common lab strain, 
W303, asa recipient for cytoplasmic transfer (Fig. 5b). The strain carried a dominant 
KARI-15allele, which prevents nuclear fusion during mating but allows cytoplasmic 
transfer. The strain also carried multiple auxotrophic markers and a mitochondrial 
DNA defect. This allowed cytoplasmic transfer to be scored through the restoration 
of mitochondrial respiration, in the absence of transfer of auxotrophic markers. 
Haploid, mating-competent derivatives of the wild ‘donor’ strains were created 
using an antibiotic-resistance marker to knock out the HO locus’*. (HO would 
otherwise preclude mating by causing haploid cells to self-mate). The recipient 
and donor strains were patched together on rich media, followed by selection of 
heterokaryons on dropout media containing glycerol as a carbon source. 

[RNQ+] strain UCD664 was verified to be Saccharomyces uvarum by colony 

PCR amplification and sequencing of the rDNA internal transcribed spacer 
region using oligonucleotides ITS1 and ITS4, as described*’. Another prion- 
containing strain not annotated as S. cerevisiae, UCD587, was originally 
annotated “S. cerevisiae race bayanus”. We ordered this strain two independent 
times from the stock centre and found it to be S. cerevisiae based on ITS1 sequence 
and growth characteristics. 
Phenotypic assays. For agar adhesion and invasion analyses, colonies were 
allowed to grow for 5-7 days at 30°C. Plates were then gently rinsed under 
running water to remove non-adherent cells; then photographed. The agar sur- 
face was then gently rubbed with a gloved finger under running water to remove 
all remaining non-invasive cells. 

For growth measurements, wild yeast strains were inoculated in quadruplicate 
into 384-well plates containing 40 11 YPD per well and grown to saturation at 
30°C (typically 48h) in a humidified chamber. After complete re-suspension, 
QRep polypropylene 384-well pin replicators (Genetix) were used to transfer cells 
(200-500 nl) to new 384-well plates in duplicate. These new plates contained 40 pl 
per well of selective media: with alternative carbon sources at 2% final concen- 
tration (YP-maltose, YP-galactose, YP-glycerol or YP-raffinose), compounds 
dissolved in YPD (0.5M NaCl, 5% ethanol, 0.4mg ml! 4-NQO, 1mM 
tBOOH, 32 mg ml! fluconazole or 50 mM hydroxyurea), acidic and basic con- 
ditions (YPD at pH4 or pH9), and finally in YPD alone (as a control). 
Concentrated stocks of 4-NQO (4mg ml ') were made in molecular biology 
grade dimethyl sulphoxide (DMSO) and stocks of fluconazole (64 mg ml — ') were 
made in ethanol. All plates were incubated at 30 °C, covered and in a humidified 
chamber. Growth was measured approximately every 20 h by OD¢o9 with a Tecan 
Sapphire2 plate reader (after complete re-suspension by gentle agitation). Hits 
were chosen if all four replicates showed a significant change in growth (P < 0.01 
determined by t-test) for at least two consecutive time points after curing of 
Hsp104-dependent prions. 

Growth rates of [PSI *]-containing wild strains (and their cured derivatives) in 
conditions used for laboratory propagation (YPD, YM broth, FM broth, wort agar 
or Wallerstein nutrient agar) was measured in quadruplicate, diluting 10° 
exponentially-growing cells of each strain in humidified 96-well plates contain- 
ing 150 ul of medium. OD¢o0 of each well was measured every 15 min after 
re-suspension by agitation (15s) in a plate reader (Multiskan Go, Thermo 
Scientific). Plates were incubated at 30°C. After 4 days no appreciable loss in 
volume was observed in the exterior wells of the plates, ensuring that the growth 
rates we measured were not an artefact of evaporation. The exponential phase of 
plots of OD¢oo versus time were fit to determine growth rates. 

Correlation between similarity in genotype and similarity in phenotype was cal- 
culated in the R statistical computing software package using published genotypic 
correlation among sequenced strains and linear regression across the conditions in 
Supplementary Table 1. Significance was established by random perturbation of the 
phenotype data. Ten thousand permutations of the data did not reveal a similarly 
large change in genotype phenotype correlation occurring by chance. 

SDD-AGE. Except where indicated below, SDD-AGE was performed as follows. 
Yeast were inoculated into 1 ml YPD in 96-well round-bottom blocks (Nunc 
P8241) with each well containing a single 3-mm glass bead. The blocks were 
incubated 18-24h at 30°C with 220r.p.m. agitation. Cells were collected by 
centrifugation at 3,000r.c.f. for 2 min, resuspended in 200 pl water, and then 
centrifuged again. Approximately 100 ul of acid-washed glass beads were then 
added to each well followed by 80 11 lysis buffer (100 mM Tris pH 8, 1% Triton 
X-100, 50 mM f-mercaptoethanol, 3% HALT protease inhibitor cocktail, 30 mM 
N-ethylmaleimide, and 12.5U ml! Benzonase nuclease). Blocks were then 
sealed with a rubber mat (Nunc 276002) and shaken at max speed twice for 
3 min on a Qiagen TissueLyser II. To each well was then added 35 pil 4X sample 
buffer (2X TAE, 20% glycerol, 8% SDS, 0.01% bromophenol blue). The blocks 
were then vortexed briefly and allowed to incubate at room temperature for three 
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minutes, followed by centrifugation for 2 min at 3,000 r.c.f. to remove cell debris. 
Electrophoresis and capillary blotting to Hybond ECL nitrocellulose were per- 
formed as described”. 

Sup35 and Rnql were detected using well-characterized antibodies". 
Detection of amyloids using the Sup35 antibody was markedly improved by 
treating the blots with Re-Blot Plus stripping solution (Millipore) before probing. 
Available antibodies to yeast prion proteins Ure2, Swil and Cyc8 did not 
satisfactorily distinguish lab strains containing amyloids of these proteins from 
those that did not. Consequently, we did not attempt to identify these prions in 
wild strains. Horseradish-peroxidase-conjugated secondary antibodies were 
detected with Lumigen TMA-6 chemiluminescent substrate (GE). 

For the detection of Mot3 amyloids, cells and lysates were prepared as 
described’’. Blots were probed with a hexahistidine antibody (GE Biosciences), 
which recognizes an endogenous hexahistidine motif in Mot3. From multiple 
experiments with a well characterized [MOT3+] isolate in the S288C genetic 
background (ref. 14 and R.H., personal communication), we estimated false- 
negative and false-positive rates with this antibody to be approximately 40% 
and 5%, respectively, under the conditions used here. We occasionally observed 
the appearance of [mot3 ] colonies (as determined by SDD-AGE and pheno- 
type) upon restreaking of [MOT3" ] isolates. Polymorphisms in MOT3 (ref. 25) as 


well as differences in the growth and spheroplasting efficiencies of wild strains 
may influence the validity of SDD-AGE for assessing Mot3’s prion status. 
Nevertheless, phenotypic differences verified our designations of [MOT3+] for 
three of six strains identified by SDD-AGE. 
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Crystal structure of the channelrhodopsin 
light-gated cation channel 


Hideaki E. Kato!, Feng Zhang’, Ofer Yizhar?, Charu Ramakrishnan’, Tomohiro Nishizawa', Kunio Hirata’, Jumpei Ito’, 
Yusuke Aita‘*, Tomoya Tsukazaki!, Shigehiko Hayashi°, Peter Hegemann’, Andrés D. Maturana*, Ryuichiro Ishitani!, 


Karl Deisseroth? & Osamu Nureki! 


Channelrhodopsins (ChRs) are light-gated cation channels derived from algae that have shown experimental utility in 
optogenetics; for example, neurons expressing ChRs can be optically controlled with high temporal precision within systems 
as complex as freely moving mammals. Although ChRs have been broadly applied to neuroscience research, little is known 
about the molecular mechanisms by which these unusual and powerful proteins operate. Here we present the crystal 
structure of a ChR (a C1C2 chimaera between ChR1 and ChR2 from Chlamydomonas reinhardtii) at 2.3 A resolution. The 
structure reveals the essential molecular architecture of ChRs, including the retinal-binding pocket and cation conduction 
pathway. This integration of structural and electrophysiological analyses provides insight into the molecular basis for the 
remarkable function of ChRs, and paves the way for the precise and principled design of ChR variants with novel properties. 


Organisms ranging from archaebacteria to human beings capture 
energy and/or information contained within environmental sources 
of light by using photoreceptors called rhodopsins, which consist of 
seven-transmembrane-helix proteins, called opsins, covalently linked 
to retinal. On the basis of primary sequences, the corresponding opsin 
genes are classified into two groups: microbial (type I) and animal (type 
II). Type I opsin genes are found in archaea, eubacteria, fungi and algae, 
whereas type II opsin genes are expressed in animals, including human 
beings. The type II proteins indirectly influence transmembrane ion 
currents by coupling to G-protein-based signal transduction pathways. 
In contrast, the type I proteins (not normally found in animals) include 
direct-light-activated regulators of transmembrane ion conductance, 
such as the light-driven ion pumps called bacteriorhodopsins and 
halorhodopsins (BRs and HRs)’? and the light-driven ion channels 
ChRs’. The light-driven ion pumps have been extensively studied, 
and structure-function relationships are well known. As opposed to 
these ion pumps, very little is known about the structure of ChRs or the 
mechanism by which these seven-transmembrane proteins conduct 
cations in a light-dependent manner’. 

Beginning in 2005, it was found that ChRs could be expressed in 
mammalian neurons to mediate precise and reliable control of action 
potential firing in response to light pulses, without the need for 
exogenous retinal in vertebrate systems* °. ChRs have now been used 
to control neuronal activity in a wide range of animals, resulting in 
insights into fundamental aspects of circuit function as well as dys- 
function and treatment in pathological states'®''. However, despite 
the rapid progress of optogenetics (a technology also encompassing 
the use of ion pumps, such as HRs), virtually nothing is known about 
how a seven-transmembrane protein can form a light-switchable 
channel for cations. Although a rough helical arrangement was visible 
in the recently published ChR2 electron microscopy structure of two- 
dimensional crystals at 6 A resolution, amino acid positioning and 
insights into channel function remained completely lacking”’. A high- 
resolution three-dimensional image would be of enormous value, not 
only to enhance understanding of microbial opsin-based channels, 


but also to guide optogenetics in the generation of ChR variants 
with novel function related to spectrum, selectivity and kinetics. 
Even with limited structural models, ChR variants have been 
engineered with faster or extended open-state lifetimes'*""’, shifted 
absorption spectra’*’”"’, reduced desensitization®'**°, and increased 
expression and photocurrent magnitude*!*’”'°. These advances 
represent the tip of the iceberg in terms of what could be achieved 
for all of the above properties, as well as for altered ion selectivity and 
unitary (single-channel) conductance properties, if detailed structural 
knowledge could be obtained to facilitate true electrostatic calcula- 
tions and molecular dynamics simulations. 

Here we present the ChR crystal structure at 2.3 A resolution. This 
high-resolution information, along with electrophysiological analyses, 
has revealed the fundamentals of ChR architecture and guides the way 
to a basic working model for channelrhodopsin function. 


Overall ChR structure 


ChR2 from C. reinhardtii consists of 737 amino acids; the seven trans- 
membrane domains (TMs) and photocurrent functionality are all 
contained within the amino-terminal ~300 amino acids. To identify 
the most promising candidates for structural studies, we constructed 
and explored an extensive range of different ChRs and ChR chimaeras 
with distinct carboxy-terminal truncations. Using fluorescence- 
detection size-exclusion chromatography (FSEC)”’, we found that a 
novel chimaeric and truncated sequence termed C1C2, primarily 
consisting of ChR1 (ref. 3) without its C terminus and with the last 
two TMs swapped for those from ChR2 (related to previous 
chimaeras’*”°”* but with an additional six-amino-acid modification 
of the C terminus, namely removal of the sequence NKGTGK), was 
not only expressed well in Sf9 insect cells but also showed good stability 
and monodispersity as well as similar spectral characteristics to pre- 
vious chimaeras” (Supplementary Fig. 1). The crystals obtained from 
fully dark-adapted C1C2 in the lipidic cubic phase belonged to the 
C222, space group and diffracted X-rays to 2.3 A resolution. We solved 
the C1C2 structure by the multiple anomalous dispersion (MAD) 


Department of Biophysics and Biochemistry, Graduate School of Science, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan. “Department of Bioengineering and Howard Hughes 
Medical Institute, Stanford University, Stanford, California 94305, USA. SRIKEN SPring-8 Center, Hyogo 679-5148, Japan. “Bioengineering Department, Nagaoka University of Technology, Niigata 940- 
2188, Japan. °Department of Chemistry, Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan. “Institute of Biology, Experimental Biophysics, Humboldt-University, Invalidenstrae 42, 


D-10115 Berlin, Germany. 


16 FEBRUARY 2012 | VOL 482 | NATURE | 369 


©2012 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


method, using mercury-derivatized crystals (Supplementary Fig. 2). As 
far as we know, this is the first example of the phase determination by 
MAD for the crystal obtained in the lipidic cubic phase. 

The truncated C1C2 chimaera (residues 1-342) is composed of an 
N-terminal extracellular domain (N domain, residues 24-83, marked in 
Fig. 1a, d), the seven TMs (TM1-TM7; residues 84-317) connected by 
three cytoplasmic loops (ICL1-ICL3) and three extracellular loops 
(ECL1-ECL3), and the C-terminal intracellular domain (C domain, 
residues 318-356) (Fig. 1b, d). In addition to the region spanning the 
N-terminal residues 1-23, which is processed as a signal peptide (data 
not shown), residues 24-48, 110-117 and 343-356 are structurally 
disordered and invisible in the electron density map, whereas the core 
transmembrane region is clearly resolved (Fig. la). Searches of the 
Protein Data Bank using the Dali server (http://ekhidna.biocenter. 
helsinki.fi/dali) suggested that the N domain, consisting of a short 
3i9-helix and two f-strands, has a novel fold. Within each C1C2 
protomer, 6 lipids and 43 water molecules were observed. 

Two C1C2 protomers were found to be tightly associated into a 
closely apposed dimer, as previously predicted from electron micro- 
scopy’. Interfacial interactions occur in the N domain, ECL1, TM3 
and TM4 of each molecule (Fig. 1b, c). Notably, Cys 66 (27), Cys 73 
(34) and Cys 75 (36) in the N domain (ChR2 numbering is shown in 
parentheses here and below for comparison with earlier literature) 
form three disulphide bonds between protomers. As Cys 73 and 
Cys 75 are highly conserved in ChRs, this interaction may contribute 
to stabilizing the N-domain interaction and molecular dimerization 
(Supplementary Fig. 3). 


Structural comparison with BR 


We next compared the C1C2 structure with that of BR and bRh 
(bovine rhodopsin). The primary sequence of ChR is similar to that 
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Figure 1 | Structure of C1C2 and comparison with BR. a-c, Crystal structure 
of the C1C2 dimer, viewed parallel to the membrane from two angles (a, b), and 
viewed from the extracellular side (c). C1C2 consists of the N domain, the seven 
transmembrane helices (TM1-TM7) connected by extracellular loops (ECL1- 
ECL3) and intracellular loops (ICL1-ICL3), and the C domain. Disordered 
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of BR as well as other microbial opsins, such as xanthorhodopsin and 
sensory rhodopsin II (Supplementary Fig. 4). Consistent with this 
sequence similarity, C1C2 superimposed well on BR (PDB accession 
1I1W6)”’, but not on bRh (PDB accession 3C9L)* (Fig. 1d and 
Supplementary Fig. 5). TM3-6 between C1C2 and BR are very similar, 
and the position of the protonated Schiff base is conserved (Fig. le), 
whereas there are three distinct features between the two structures. 
First, C1C2 has additional N-terminal and C-terminal domains. The N 
domain, as noted above, contributes to dimer formation, and the C 
domain may be involved in subcellular localization and scaffolding in 
Chlamydomonas; for example, to tether ChR to the algal eyespot”. 
Second, TM7 of C1C2 protrudes into the intracellular space, projecting 
~18 A from the membrane surface, and the intracellular end of TM7 is 
shifted towards the central axis of the monomer by 2.7 A, as compared 
with BR (Fig. 1d). Although the function of this protruding part of 
TM7 is unclear, His 313, His317 and Gly318 may contribute to 
stabilizing the intracellular C domain via a water-mediated hydrogen- 
bonding network (Supplementary Fig. 6). Last, and most importantly, 
the C1C2 extracellular ends of TM1 and TM2 are tilted outward by 
3.0 A and 4.1 A, respectively, compared to those of BR (Fig. le). These 
tilts enlarge the cavity formed by TM1, 2 and 7 and allow water influx 
for a cation-translocation pathway, as discussed later. 


Retinal-binding pocket, Schiff base and counterion 

As in other microbial-type rhodopsins, all-trans retinal (ATR) is 
covalently bound to Lys 296 (257) on TM7 (Lys 216 in BR), forming 
the Schiff base. As in BR, five aromatic residues (Trp 163, Phe 217, 
Trp 262, Phe 265 and Phe 269) are located around the polyene chain 
and the B-ionone ring, forming a hydrophobic pocket for ATR (Fig. 2a, 
b), whereas Cys 167 (128), Thr 198 (159) and Ser 295 (256) form a less- 
hydrophobic pocket, and may contribute to colour shift (Fig. 2a, b). 


c 


C domain 


TMS 


regions are represented as dotted lines. The ATR is coloured pink. d, e, Side 
view (d) and extracellular view (e) of the superimposed TMs of C1C2 (green) 
and BR (orange). The yellow double arrows indicate the shifts of the 
extracellular parts of TM1 and TM2. 
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Figure 2 | Structural comparison of the retinal-binding pocket between 
C1C2 and BR. a, Structure of the retinal-binding pocket of C1C2, with an omit 
map of ATR at 3o and of the surrounding residues (subtract 39 from the C1C2 
residue number to obtain ChR2 numbering) at 3.50. b, Structure of the retinal- 
binding pocket of BR. 


A previous report suggested that the side chains of Cys 167 (128) 
and Asp 195 (156) (Thr90 and Asp 115 in BR) directly hydrogen 
bond with each other”® (Fig. 2a), and that this interaction may func- 
tion as the molecular switch to direct transition to the conducting 
state. However, in the present C1C2 structure, the distances between 
the thiol group of Cys 167 and the carboxyl oxygen atoms of Asp 195 
are 4.4 A and 4.6 A, respectively, and the thiol group of Cys 167 is not 
associated with Asp 195, but with the m-electron system in the retinal 
molecule (Fig. 2a). 

In BR, a water molecule receives a proton from the protonated 
Schiff base and donates a proton to Asp 85 (ref. 27; Fig. 3b); this 
arrangement is conserved in C1C2. However, in C1C2 the distances 
from the protonated Schiff base are 4.4 A, 3.4 A and 3.0A respectively 
for the water molecule, Glu 162 (123) (Asp 85 in BR) and Asp 292 
(253) (Asp 212 in BR) (Fig. 3a, b). Therefore, in C1C2, Asp 292 or 
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Figure 3 | The protonated Schiff base and its counterions in C1C2 and BR. 
a, b, Structures of the environment around the Schiff base in C1C2 (a) and BR 
(b). Numbers indicate the distance between two atoms connected by dashed 
lines. c, Effects of the mutation of two possible counterions on the 
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possibly Glu 162, but not the water located between them, may 
directly receive a proton from the protonated Schiff base. In BR, 
Asp 212 retains a low pK, because it is hydrogen bonded to Tyr 57 
and Tyr 185, which do not move during the photocycle (Fig. 3b). On 
the other hand, in C1C2, Tyr57 and Tyr 185 are substituted with 
Phe 133 and Phe 265, and Asp 292 only forms a hydrogen bond with 
water (Fig. 3a); thus Asp 292 could move relatively freely in the photo- 
cycle. Therefore in C1C2, the pK, of Asp 292 can change, in contrast 
to the corresponding residue in BR. Moreover, the pK, values of 
Glu 162 and Asp 292, calculated using PROPKA** (Supplementary 
Table 2), showed that Glu 162 may be protonated and Asp 292 may 
be deprotonated in our structure. Thus, we suggest that Asp 292, 
rather than Glu 162, is the primary proton acceptor in C1C2, consist- 
ent with the finding that Glu 123 mutants show current amplitudes 
similar to wild type’*'’. To verify further this notion, we expressed the 
E162A and D292A mutants of C1C2 in HEK293 cells, and recorded 
photocurrents in response to 465-nm light pulses (Fig. 3c, d and 
Supplementary Figs 7 and 8), revealing that replacement of Glu 162 
by alanine resulted in moderately decreased currents, whereas the 
substitution of Asp 292 by alanine almost completely abolished 
photocurrents despite robust membrane expression. Moreover, the 
onset time constant (toy) of the D292A mutant was significantly 
larger than that of wild type (Supplementary Fig. 9), consistent with 
the structure showing that Asp 292, rather than Glu 162, may be the 
major proton acceptor from the protonated Schiff base in ChR. 


Electronegative pore framed by four TM helices 

We calculated the electrostatic surface potential of C1C2, which 
revealed an electronegative pore formed by TM1, 2, 3 and 7 
(Fig. 4a). In this pathway, a number of negatively charged residues, 
including Glu 129 (90), Glu 136 (97) and Glu 140 (101), as well as 
Glu162 (123) and Asp292 (253), are aligned along the pore 
(Fig. 4b). Because most of the negatively charged residues are derived 
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photocurrent. Photocurrents on C1C2-expressing HEK293 cells were 
measured at 16 different holding potentials. WT, wild type. d, The peak 
amplitudes of the photocurrents, normalized by the cell’s input capacitance. 
Values are means and s.e.m. of 7-15 experiments. **P < 0.01, ***P < 0.001. 
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Figure 4 | Cation-conducting pathway formed by TMI, 2, 3 and 7. a, Pore- 
lining surface calculated by the CAVER” program, coloured by the electrostatic 
potential. Dashed red lines indicate putative intracellular vestibules. b, Close-up 
views of the surface of the pore, with the 17 polar lining residues (subtract 39 
from the C1C2 residue number to obtain ChR2 numbering). Hydrogen bonds 


from TM2, we suggest that the ion conductance and selectivity of 
C1C2 are mainly defined by TM2. 

On the extracellular side of the pore, a vestibule formed by the N 
domain and ECL1-3 opens up to a diameter of about 8 A, where 
Lys 154 (115), Lys 209 (170) and Arg 213 (174) form a slightly elec- 
tropositive surface around the vestibule (Supplementary Fig. 10). 
Deeper within the vestibule, Arg 159 (120), Tyr 160 (121), Glu274 
(235) and Ser 284 (245) form a weak electronegative surface and fix 
the positions of TM3, 6 and 7 by a water-mediated hydrogen-bond 
network (Supplementary Fig. 11a). As these four residues are highly 
conserved not only in ChRs but also in BRs (Arg 82, Tyr 83, Glu 194 
and Glu 204, respectively), and the corresponding residues in BR 
reportedly have an important role in proton pumping, we generated 
the R159A mutant in C1C2. We found that this mutant did not 
produce a photocurrent despite robust membrane expression 
(Fig. 4c, d and Supplementary Figs 7 and 8); because the orientation 
of Arg 159 is quite different from the corresponding Arg residue in 
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are shown as black dashed lines. c, Photocurrents of mutants of the five residues 
within the pathway, measured under the same conditions as in Fig. 3c. d, The 
peak amplitudes of the photocurrents, as in Fig. 3d. *P < 0.05, ***P < 0.001. 
Error bars represent s.e.m. 


BR, and these residues form the extracellular hydrophilic surface, we 
suggest that this conserved cluster has an important role in creation of 
the extracellular vestibule, rather than in proton movement as in BR 
(Supplementary Fig. 11b). 

In the middle of the pore, 12 polar residues (Gln 95 (56), Thr 98 
(59), Ser 102 (63), Glu 122 (83), Glu 129 (90), Lys 132 (93), Glu 136 
(97), Glu 140 (101), Glu 162 (123), Thr 285 (246), Asp 292 (253) and 
Asn 297 (258)) form a hydrophilic and strongly electronegative 
surface (Fig. 4b). To investigate contributions to ChR function, we 
measured photocurrents, kinetics and selectivity for four mutants 
(Q95A, K132A, E136A, E140A) (Fig. 4c, d and Supplementary Figs 
7-9 and 12). The K132A mutant had faster kinetics and similar 
current amplitude relative to wild type, whereas the Q95A and E140A 
mutants exhibited moderately reduced currents, and the E136A mutant 
showed very little photocurrent, despite robust membrane expression. 
Three of the four mutants (Q95A, K132A and E136A) altered ion 
selectivity (Supplementary Fig. 12); therefore, we suggest that this 
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pore is important for cation conduction and that, as previously sug- 
gested”’, Glu 136 (97) is essential. 

Although this putative cation-conducting pathway is opened 
towards the extracellular side, the cytoplasmic side of this pathway 
is occluded owing to two constrictions (Figs 4b and 5). The first 
constriction is formed by three highly conserved polar residues: 
Ser 102 (63), Glu 129 (90) and Asn 297 (258) (Figs 4b and 5a). In this 
constriction site, Ser 102, with a B-OH group that hydrogen bonds to 
the main-chain carbonyl oxygen of Thr 98, fixes the position of 
Asn 297 and, in turn, Asn 297 fixes Glu129 by hydrogen bonds. 
Glu 129 protrudes into and occludes the pore. To analyse this putative 
channel gate, we prepared four mutants ($102D, E129A, E129Q and 
N297D) and measured photocurrents, kinetics and ion selectivity 
(Supplementary Figs 7-9, 12 and 13). We found that E129A, 
E129Q and N297D affect ion selectivity and $102D, E129Q and 
N297D affect channel kinetics (Supplementary Figs 9 and 12). 
These results, consistent with previous work*”*’, indicate the import- 
ance of these three residues and suggest that cations pass through this 
constriction site in the conducting state. 

The second constriction is formed by the phenol group of Tyr 109 
(70) (Figs 4b and 5b). Given the high B-factor of the C-terminal end of 
TM1 (Supplementary Fig. 14) anda previous Fourier transform infrared 
spectroscopy (FT-IR) study showing that the o-helices undergo con- 
formational changes during the photocycle’, movement of the TM1 
C-terminal end may open the pore exit formed between TM1, 2 and 7. 
As TM1 does not directly interact with the retinal chromophore, the 
signal of retinal isomerization is expected to be transmitted to TM1 via 
movements of TM2, 3 and/or 7. However, we cannot exclude the 
possibility that movements of TM2, 3 and/or 7 form a cytoplasmic 
vestibule next to Tyr 109, and further studies will be required to 
identify the pore exit. 


Discussion 


This first crystal structure of a light-gated cation channel in the closed/ 
dark state at 2.3 A resolution provides insight into ChR dimerization, 
retinal binding and cation conductance. Moreover, owing to the large N 
domain unique to ChR, it has been difficult to align precisely the ChR 
sequence with other microbial rhodopsins (notably BR), and the present 
structure-based alignment (Supplementary Fig. 4) will assist in the 
design and interpretation of functional analyses, including electro- 
physiological and spectroscopic studies of ChR at the molecular level. 
The structural features around the ATR, Schiff base, and conduction 
pathway also provide insight into the blueshifted absorption spectrum 
of ChR (Amax = 470 nm), as compared to that of BR (A max = 568 nm). 
In general, the maximum absorption wavelength of retinal proteins is 


Figure 5 | Two constriction sites on the cytoplasmic side of C1C2 in the 
closed state. a, The first constriction site is formed by Ser 102 (63), Glu 129 (90) 
and Asn 297 (258). Hydrogen bonds are shown as black dashed lines. The blue 
dashed line represents a possible proton transfer pathway. b, The second 
constriction, made by Tyr 109. The cavity formed by TM1, 2 and 7 is occluded 
by Tyr 109, and the cavity formed by TM2, 3 and 7 is occluded by Glu 122 and 
His 173. 
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determined by the energy difference between ground (So) and excited 
(S;) states, and this gap is mainly affected by the planarity of the 
conjugated system of the retinal chromophore, the distance between 
the protonated Schiffbase and its counterion, and the interaction of the 
chromophore with polar or polarizable residues*’. Although the 
planarity of the ATR is unchanged between ChR and BR, the 
counterion of ChR, Asp 292, is located ~1 A closer to the Schiff base 
than the corresponding Asp residue of BR (Fig. 3a, b), and the 
negatively charged residues are aligned along the conducting pathway 
(Fig. 4a, b). These environments are likely to stabilize the Sp state of 
ChR, thus enlarging the energy gap between the Sp and Sj states and 
thereby causing the relative absorption spectrum blueshift. 

Much about the photocycle remains unknown but is thought to be 
similar to that of BR****”°, in which the essential early event is the 
dipole change of the protonated Schiff base, which alters the nitrogen 
pK, by several orders of magnitude. In the case of ChRs, this may 
cause the release of the Schiff base proton to Asp 292 (probably not to 
either Glu 162 or water because Asp 292 is closer than these other two 
moieties, which also may explain why the D292A mutant is inactive; 
Fig. 3). The protonation of Asp 292 is likely to repel Lys 132 (93), as 
with Arg 82 in BR, and this movement may enlarge pore diameter and 
help cations to pass (Figs 3a and 4b). It is also thought that channel 
opening may be coupled with reprotonation of the Schiff base. Given 
the calculated pK, (Supplementary Table 2) and the distance from the 
Schiff base nitrogen atom, we suggest two candidates for this proton 
donor—Glu 122 and Glu 129 (Supplementary Fig. 15)—but further 
studies, including structural determinations of photocycle intermediate 
states, will be required to refine our model. 

In recent years, many strategies have been applied to create ChR 
variants with improved properties for optogenetics, ranging from 
designer ChR variants based on functional and structural similarities 
between BR and ChR (E123X, H134R, C128X, 1131V, D156A, T159C, 
C1V1)%*-1719°, to chimaera construction along with mutagenesis 
(ChRGR, L132C, ChD, ChEF, C1V1)'*”? (Fig. 6a). These approaches 
have generated a number of ChR variants with useful properties, but the 
high-resolution crystal structure is a prerequisite for the design of ChR 
variants with ideal properties. The present crystal structure describes 
the environment around the retinal-binding pocket (Fig. 6a, b), which 
will enable optimized design of red- and blueshifted ChR variants. In 
addition, structure of the cation-conducting pathway may facilitate 
construction of ChR variants with improved photocurrents, photo- 
sensitivity, cation selectivity and kinetics. For example, K132A and 
Q95A show strong photocurrents and K™ selectivity (Supplementary 
Figs 9 and 12), which could be useful to suppress neural activity. 


Figure 6 | Distribution of known mutations and possible candidates for 
future mutations. a, Mutations that affect both the absorption spectrum and 
the kinetics (Cys 167 (128), Glu 162 (123) and Asp 195 (156); deep red), the 
conductance (Thr 198 (159); light green), the selectivity (Leu 171 (132); dark 
blue) and the kinetics (Glu 122 (83), Ile 170 (131) and His 173 (134); dark cyan) 
of ChR2. b, Polar (Glu 162 (123), Thr 166 (127), Cys 167 (128), Asp 195 (156), 
Thr 198 (159) and Ser 295 (256); magenta) and non-polar (Ile 170 (131), Ie 199 
(160), Gly 202 (163), Leu 221 (182) and Gly 263 (224); orange) residues 
surrounding ATR. 


16 FEBRUARY 2012 | VOL 482 | NATURE | 373 


©2012 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


To understand the photocycle in more detail, further structural 
studies, including determination of crystal structures in intermediate 
states, are clearly needed. However, the present structural informa- 
tion represents a key step in enabling the principled design of 
ChR variants with new properties, and will accelerate both applica- 
tions of optogenetics to intact-systems biology, and basic mechanistic 
understanding of these remarkable photoreceptor proteins. 


METHODS SUMMARY 


C1C2 was cloned into cleavable enhanced green fluorescent protein (EGFP)-Hisg 
fusion pFastBacl vector. The fusion protein was expressed in insect cells, solubilized 
in 2.5% (w/v) n-dodecyl-f-p-maltoside (DDM) and 0.5% (w/v) cholesteryl 
hemisuccinate, and purified by nickel affinity chromatography. After that, the 
C-terminal EGFP was cleaved by His-tagged tobacco etch virus (TEV) protease 
(homemade). Then the sample mixture was passed through Ni-NTA resin again to 
remove the cleaved His-tagged EGFP and TEV protease. The sample was further 
purified by size-exclusion chromatography. Crystals were grown in a lipidic cubic 
phase using monoolein. Diffraction data were measured at beamline X06SA of the 
Swiss Light Source and at beamline BL32XU of SPring-8. The structure was solved 
by the MAD method using mercury derivative. Data collection and refinement 
statistics are presented in Supplementary Table 1. Electrophysiological recordings 
were conducted using patch-clamp on HEK293 cells expressing the wild-type and 
mutant C1C2. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Expression and purification of C1C2. Chimaeras between ChR1 and ChR2 from 
C. reinhardtii and other algal species were subcloned into the pCGFP-EU vector”! 
for expression in HEK293 cells. The tobacco etch virus (TEV) protease cleavage 
site, the coding sequence of enhanced GFP (EGFP), and the octa-histidine tag 
(EGFP-Hisg) were introduced at the C terminus of the chimaeric constructs. All 
constructs were screened by FSEC analysis’. The gene encoding the best 
chimaera (C1C2) was subcloned into the modified pFastBacl vector for expres- 
sion in Sf+ insect cells. Baculovirus-infected Sf+ cells were cultured in Sf900II 
(Invitrogen) at 27°C for 24h, and then the temperature was reduced to 20°C. 
Cells were harvested 72h after infection by centrifugation at 6,000g for 10 min. 
The pellets were disrupted by two passages through a microfluidizer at 15,000 
pounds per square inch, and were resuspended in a buffer containing 300 mM 
NaCl, 50 mM Tris-HCl, pH 8.0, 5% glycerol and 0.1 mM phenylmethylsulfonyl 
fluoride (PMSF). The cell debris was cleared by centrifugation at 10,000g for 
40 min, and the crude membrane fraction was collected by ultracentrifugation 
(Ti45 rotor, 43,000 r.p.m., 1h). This fraction was solubilized in a buffer contain- 
ing 300mM NaCl, 50mM Tris-HCl, pH 8.0, 5% glycerol, 20mM imidazole, 
0.1mM PMSF, 2.5% n-dodecyl-f-b-maltoside (DDM) and 0.5% cholesteryl 
hemisuccinate (CHS). The insoluble material was removed by ultracentrifugation 
(Ti70 rotor, 45,000 r.p.m., 30 min), and the supernatant was mixed with Ni-NTA 
resin (QIAGEN). After binding for 1 h, C1C2 was eluted in buffer supplemented 
with 300 mM imidazole. Following the cleavage of EGFP-His, by His-tagged 
TEV protease (homemade), the sample was reloaded onto the Ni-NTA column 
to remove the cleaved EGFP-Hissg. The flow-through containing C1C2 was col- 
lected, concentrated, and further purified by size-exclusion chromatography in 
150 mM NaCl, 50 mM Tris-HCl, pH 8.0, 

5% glycerol, 0.05% DDM and 0.01% CHS. Peak fractions were pooled and 
concentrated to ~10mgml ' for crystallization. For the mercury derivative, 
the concentrated protein was incubated with a sixfold molar excess of methyl 
mercury chloride at 20°C for 1h. The derivative was ultracentrifuged and used 
for crystallization experiments. 

Crystallization. C1C2 was mixed with monoolein (Sigma) in a 2:3 protein to lipid 
ratio (w/w). Aliquots (100 nl) of the protein-LCP mixture were spotted on a 96-well 
sandwich plate and overlaid by 1 il of precipitant solution by the crystallization 
robot, mosquito LCP (TTP LabTech). Native crystals were obtained in 30-34% 
(w/v) PEGSOODME, 100 mM Na citrate, pH 6.0, 100 mM MgCl, 100 mM NaCl 
and 100 mM (NH4)2SOg, whereas the derivative crystals were grown in 31% (w/v) 
PEGS500DME, 100 mM HEPES-NaOH, pH 7.0, 200 mM Li,SO, and 10 mM ATP. 
All crystals were incubated for 2-3 weeks in the dark. They were harvested using 
micromounts (MiTeGen), and were flash-cooled in liquid nitrogen without any 
additional cryoprotectant. 

Structure determination. X-ray diffraction data sets for the native and mercury- 
derivatized protein crystals were collected on beamline X06SA at SLS and 
beamline BL32XU at SPring-8, using a 1-j1m-wide, 15-jum-high microbeam™. 
Data were indexed and scaled with the programs XDS* and SCALA”, or with 
DENZO and SCALEPACK from the HKL2000 program suite (HKL Research). 
Experimental phases were determined by the MAD method, using the four Hg 
sites identified with the program SHELX"'. Subsequent refinements of the heavy 
atom parameters and phase calculations were performed with the program 
SHARP”. The data collection and phasing statistics are shown in Supplemen- 
tary Table 1. The initial model structure of C1C2 was built with the program 
Phyre”, using the Anabaena sensory rhodopsin structure (PDB accession 1XIO) 
as the template. The resultant structure was manually modified to fit into the 
experimental electron density maps, using the program Coot. The structure was 
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then refined with the program Phenix”. Figures were prepared with Cuemol 
(http://www.cuemolL.org). 

Electrophysiology. HEK293 cells were cultured on poly-lysine-coated, glass- 
bottom culture dishes (Matsunami), and were transfected with 0.5 1g of a plasmid 
construct containing the GFP-tagged C1C2 or the GFP-tagged C1C2 mutants. At 
24-30h after transfection, the cells were placed in a bath medium, containing 
140 mM NaCl, 1 mM CaCl, 2mM MgCl, 10 mM HEPES and 5 mM glucose (pH 
7.4 with NaOH), under an inverted microscope (Olympus IX71). Calcium and 
potassium photocurrents were measured by replacing 140 mM NaCl by 90 mM 
CaCl, or 140 mM KCl accordingly. For proton photocurrents, cell bath was 5 mM 
NaCl, 135 mM N-methyl-p-glucamine, 1mM CaCl, 2mM MgCl, 10mM 
HEPES and 5mM glucose (pH 6.4). A borosilicate patch pipette (Harvard 
Apparatus), with a resistance of about 5-8 MQ, was filled with 140 mM KCl, 
5mM EGTA, 2mM MgCl, and 10mM HEPES (pH 7.2 with KOH). C1C2 
currents were recorded in the voltage-clamp mode and in the whole-cell con- 
figuration. The cells were held at a membrane potential of —80 mV, and were 
depolarized by 10 mV voltage steps of 1.8s up to +70 mV. The light-dependent 
currents were activated 200 ms after the depolarization step, with 465 nm light 
(1.5 mW mm ”) for 1,000 ms, elicited by a high power LED illumination system 
(LEX2-B, Brainvision) connected to an A/D converter (Digidata 1440, Axon 
CNS, Molecular Devices), controlled by the pClamp10 software (Axon CNS). 
The light power was 1.5 mW mm 7. Currents were measured using an Axopatch 
200B amplifier (Axon CNS, Molecular Devices), filtered at 2 KHz, and sampled at 
5 KHz, using a Digidata 1440A digitizer (Axon CNS) controlled by the pClamp10 
software (Axon CNS). 

Fluorescence measurements. The cells were transfected with 0.5 1g wild-type 
C1C2 or C1C2 mutants, using Fugene 6, for 30 h. The cells were then washed with 
PBS and fixed with 4% paraformaldehyde in PBS for 20 min at room temperature 
(20 °C), and washed again with PBS before microscopy observation. GFP fluor- 
escence was observed with a laser confocal microscope (FV1000 Olympus). To 
estimate membrane expression of C1C2 and its mutants, the ratio between the 
membrane fluorescence and cytosolic were determined. 

Ultraviolet/visible spectroscory. Ultraviolet/visible absorption spectra were 
recorded with an Ultrospec 3300 pro ultraviolet/visible spectrophotometer 
(Amersham Biosciences) by use of 1-cm quartz cuvettes. Freshly prepared 
C1C2 was used for the measuments. pH was adjusted by addition of 100 volumes 
of buffer solution yielding final concentrations of 50mM Na citrate, pH 4.0, 
50mM Na acetate, pH 5.0, 50mM Na cacodylate, pH 6.0, 50 mM HEPES, pH 
7.0,50 mM Tris, pH 8.0 and 9.0, or 50 mM CAPS, pH 10.0, plus 150 mM NaCl, 5% 
glycerol, 0.05% DDM and 0.01% CHS. 
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Light echoes reveal an unexpectedly cool y Carinae 
during its nineteenth-century Great Eruption 


A. Rest!, J. L. Prieto’, N. R. Walborn!, N. Smith*, F. B. Bianco”’®, R. Chornock’, D. L. Welch®, D. A. Howell®®, M. E. Huber’, 
R. J. Foley’, W. Fong’, B. Sinnott®, H. E. Bond’, R. C. Smith’®, I. Toledo", D. Minniti!* & K. Mandel” 


n Carinae is one of the most massive binary stars in the Milky 
Way’”. It became the second-brightest star in our sky during its 
mid-nineteenth-century ‘Great Eruption’, but then faded from view 
(with only naked-eye estimates of brightness**). Its eruption is 
unique in that it exceeded the Eddington luminosity limit for ten 
years. Because it is only 2.3 kiloparsecs away, spatially resolved 
studies of the nebula have constrained the ejected mass and velocity, 
indicating that during its nineteenth-century eruption, » Car 
ejected more than ten solar masses in an event that released ten 
per cent of the energy of a typical core-collapse supernova™®, 
without destroying the star. Here we report observations of light 
echoes of y Carinae from the 1838-1858 Great Eruption. Spectra of 
these light echoes show only absorption lines, which are blueshifted 
by —210kms_', in good agreement with predicted expansion 
speeds®. The light-echo spectra correlate best with those of G2- 
to-G5 supergiants, which have effective temperatures of around 
5,000 kelvin. In contrast to the class of extragalactic outbursts 
assumed to be analogues of the Great Eruption of 1 Carinae”, 
the effective temperature of its outburst is significantly lower than 
that allowed by standard opaque wind models". This indicates that 
other physical mechanisms such as an energetic blast wave may have 
triggered and influenced the eruption. 

n-Car-like giant eruptions of luminous blue variable stars are 
characterized by significant mass-loss and an increase in luminosity 
by several magnitudes*’. It has been thought that this increase in 
luminosity drives a dense wind, producing an optically thick, cooler 
pseudo-photosphere with a minimum effective temperature of 7,000 K 
and an F-type spectrum’’. Within this model, n Car has been consid- 
ered the prototype of these “supernova imposters”. 

We obtained images in proximity to n Car (Fig. 1) that, when 
differenced, show a rich set of light echoes. The largest interval between 
our images is eight years. We have also found similar echo candidates 
at other positions, which we are currently monitoring. The large 
brightening and long duration point to the Great Eruption as the 
source of the light echoes. We have also obtained a composite light 
curve in the Sloan Digital Sky Survey (SDSS) i filter of the light echoes 
(see Fig. 2), showing a slow decline of several tenths of a magnitude 
over half a year. This light curve is most consistent with the historical 
observations* of a peak in 1843, part of the 1838-1858 Great Eruption, 
although further observations are necessary to be certain (see the 
Supplementary Information). 

Spectra of the light echoes (see Fig. 3) show only absorption lines 
characteristic of cool stellar photospheres, but no evidence of emission 
lines. In particular, the Ca 1 infrared triplet is only observed as absorp- 
tion lines in the spectrum. Because of bright ambient nebular emission, 
itis difficult to determine whether there is any Hx emission from n Car 


itself, but in any case it must be weak, if present. By cross-correlating 
each of our 1 Car echo spectra with the Ultraviolet and Visual Echelle 
Spectrograph (UVES) spectral library'* (see Supplementary Figs 2 
and 3), we find the best agreement with supergiant spectral types in 
the range of G2 to G5, with an effective temperature of around 5,000 K. 
Spectral types of F7 or earlier are ruled out by our analysis (see 
Supplementary Information for more details). 

Doppler shifts of absorption features in the echo spectra provide 
direct information about the outflow speeds during the eruption. The 
Cau infrared triplet absorption features in the spectrum are noticeably 
blueshifted (see Supplementary Fig. 2). By cross-correlation with 
G-type’* templates, we determine velocities of -202 + 9, -210 + 14 
and -237+17kms ° (errors are standard deviation) for our three 
individual spectra and an average velocity of -210 + 30kms° ', which 
includes an uncertainty for the dust sheet velocity. 

The bipolar nature of the Homunculus nebula shows that the n Car 
Great Eruption was strongly aspherical. It has been predicted that the 
outflow speeds that one would derive from the spectra of 7 Car in 
outburst, looking at the poles and equator of the double lobes, would 
be about —650kms !and —40to —100kms 1}, (note that the speeds 
are negative because they are blueshifted, that is, the outflow is moving 
towards us) respectively'® (outflow speeds near the equator have a 
steep latitude dependence). The light echo we investigate here arises 
from latitudes near the equator of n Car (see Supplementary Fig. 1), 
and the measured blueshifted velocity of —210 + 30kms ' is in good 
agreement with expansion speeds within +20° of the equatorial plane. 
Wealso find a strong asymmetry in the Ca 11 infrared triplet, extending 
to a velocity of —850kms~ '—well below the speed of the fastest polar 
ejecta found previously’, but in good agreement with speeds observed 
in the blast wave at lower latitudes®. Future observations of light echoes 
viewing the rn Car eruption from different directions, in particular 
from the poles, has the potential to observe these very-high-velocity 
ejecta and other asymmetries. 

A characteristic of luminous-blue-variable outbursts is their trans- 
ition from a hot quiescent state to a cooler outburst state, although this 
feature is less well observed for the giant eruptions (see Fig. 4). Two 
potential models for luminous-blue-variable outbursts involve either 
an opaque stellar wind driven by an increase in luminosity, or a 
hydrodynamic explosion. The traditional mechanism for n-Car-like 
giant eruptions has been that an unexplained increase in luminosity 
drives a denser wind, so that an optically thick pseudo-photosphere 
forms at a layer much larger and cooler than the hydrostatic stellar 
surface’. This model predicts a minimum effective temperature of 
7,000 K, resembling A- or F-type supergiants*’’, because the wind 
opacity depends on the temperature (see Fig. 4). A giant eruption 
evidently occurs as a massive star attempts to evolve redward and 
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Figure 1 | 1 Car light echoes. The left panel shows the positions of the star 
7 Carinae and our images (white box), plotted on an image in the light of three 
different emission lines: oxygen (blue), hydrogen (green) and sulphur (red). 
(Photo taken by N.S.) The middle panels show the images obtained with the 
CTIO 4-m Blanco telescope of a region about 0.5° to the south of n Car at 10 
March 2003 (epoch A), 10 May 2010 (epoch B), and 6 February 2011 (epoch C), 
from top to bottom. The right panels show the difference images ‘C minus A’ 
and ‘C minus B’ at the top and middle, respectively. Example light-echo 
positions are indicated with blue (epoch A) and red (epochs B and C) arrows. 
The bottom right panel shows a zoom of the spectrograph slit, indicated with a 
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Figure 2 | Historical and light-echo lightcurve of y Car. The historical light 
curve’ in visual apparent magnitudes is shown with black circles and lines, with 
error bars indicating approximate uncertainties in these eye estimates. Light 
echo brightnesses (SDSS i; error bars are the standard deviation) from our eight 
modern images spanning about eight years are displayed shifted by 174.2 Earth 
years (green circles), 167.95 years (red circles) and 166.28 years (blue circles), to 
illustrate the best-matching time delays for the 1838, 1843 and 1845 outbursts, 
respectively. The first epoch is an upper limit indicated with an arrow. The 
upper panel shows the full time range of the Great Eruption and therefore 
shows all three potential matches, whereas the lower panels show the 
brightnesses from seven of our eight modern epochs in a magnified time period 
around each peak. 
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blue line. For all panels north is up and east is to the left. Applying the vector 
method that previously allowed us to identify the source of the light echoes 
from the supernovae that produced the supernova remnants SNR 0509-67.5, 
Cas A and Tycho”*”*, we find that a dramatic brightening of n Car must be the 
origin. In these echoes, unlike those of Galactic supernovae”, there is still 
significant spatial overlap even at separations of one light-year, suggesting that 
the duration of the event causing them must be significantly longer than one 
year. We also see brightening of two magnitudes or more within eight years. 
Thus, the Lesser Eruption from 1887 to 1896, which brightened by only a 
magnitude, is excluded as the source. 


encounters the Humphreys—Davidson limit'*, beyond which no stable 
stars are observed. 

Surprisingly, our G-type light-echo spectrum of the n Car Great 
Eruption is inconsistent with expectations of an opaque-wind model" 
(see Fig. 4). With this model, it is difficult to explain the high (10°° erg) 
kinetic energy’ and the presence of a fast blast wave at large radii®. 
Instead, these observations point towards a hydrodynamic explosion 
that must have influenced the Great Eruption”®*. 

The first visual spectroscopic observations of 1 Car around 1870 
showed emission lines’*"’. A photographic spectrogram obtained dur- 
ing its Lesser Eruption*®”' around 1890 resembles an F-type supergiant 
blueshifted by -200kms ', with moderate P Cygni hydrogen pro- 
files, which is as expected in the opaque-wind model'’. The difference 
between the 1890 spectrum and our light-echo spectrum of the Great 
Eruption is therefore quite striking, indicating that two distinct phys- 
ical processes may have been involved for two outbursts of the same 
object. However, the 1890 event also produced a mass ejection, the 
Little Homunculus, with the same axial symmetry (although much 
smaller mass) as the Great Eruption”. 

Luminous-blue-variable giant eruptions are rare, and have only 
been recorded twice in our Galaxy in the last 400 years: the Great 
Eruption of n Car and the giant eruption in the seventeenth century 
of P Cygni. Because of their considerable intrinsic brightness just 
below the luminosity of faint core-collapse supernovae, about two 
dozen giant eruption candidates, called supernova imposters because 
they have often been mistaken for supernovae, have been found in 
various extragalactic supernova searches’. Typically, the hotter 
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Figure 3 | Light echo spectra of the Great Eruption of n Car. The three 
optical low-resolution spectra of the light echo (black lines) were taken at J2000 
position right ascension 10h 44 min 12.127 s and declination —60° 16’ 01.69”’ 
in March and April 2011 obtained at the Magellan I 6.5-m and du Pont 2.5-m 
telescopes of the Las Campanas Observatory, Chile. A log of the spectroscopic 
observations and details of the spectra is presented in Supplementary Table 1. 
The slit positions differ only slightly in slit angle. The spectra were reduced 
using standard techniques and then wavelength-calibrated using observations 
of an HeNeAr lamp. The wavelength calibration was checked and corrected 
using night-sky emission lines, especially [O 1] 25,577 A, and OH lines in the 
red part of the spectrum. We flux-calibrated the spectra using a flux standard 
observed the same night as the science observations. The left panel shows the 
spectra from 5,000 to 9,000 A. The spectra are not corrected for reddening nor 
for the blueward scattering by the dust. For comparison, the blue lines show 
spectra of three examples of supernova imposters: SN 1997bs, SN 2009ip and 
UGC 2773-OT1. The right panel shows the Hz and [N 1] emission lines. We 
note that the background emission-line subtraction is incomplete because the 
emission lines vary spatially. Also, EC1A H« is at the edge of the chip and is 
therefore uncertain. Crossed circles indicate the locations of atmospheric 
absorption lines. 


supernova imposters have steep blue continua, stronger and broader 
Balmer lines, and relatively weak absorption, whereas the cooler ones 
tend to have redder continua, weaker and narrower Balmer lines, 
strong [Call] and Call emission, deeper P Cygni absorption features, 
and in some cases stronger absorption spectra similar to those of 
F-type supergiants”’. However, the 7 Car Great Eruption light-echo 
spectrum is quite different. Its spectral type is G2 to G5, significantly 
later than all other supernova imposters at peak. Furthermore, the Ca 1 
infrared triplet lines are only in absorption. For the extreme mass-loss 
rates required for the Great Eruption of n Car, another process must 
give rise to the apparent temperature. 

The Great Eruption of n Car has been considered the prototype of 
the extragalactic supernova imposters or n Car analogues, even though 
it is actually an extreme case in terms of radiated energy (10*°” erg), 
kinetic energy (>10°° erg), and its decade-long duration”’. The spectra 
of the light echo indicates now that it is not only extreme, but a 
different, unique object. It is difficult to see how strong emission lines 
could be avoided in an opaque wind where the continuum pho- 
tosphere is determined by a change in opacity, and its temperature 
and broad absorption lines are more consistent with the opaque cool- 
ing photosphere of an explosion. What triggered such an explosion 
and the reason that the huge mass-loss did not destroy the star are still 
unknown, but predictions from future radiative transfer simulations 
trying to explain n Car and its Great Eruption can now be matched to 
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Figure 4 | Hertzsprung-Russell diagram with luminous blue variables and 
y Car. Adaptation’ of a Hertzsprung—Russell diagram showing luminous blue 
variables, related hypergiant stars, and the peak luminosities of luminous-blue- 
variable-like eruptions. The grey bands denote the typical locations of luminous 
blue variables in quiescence (eft, diagonal band) and during the $ Doradus-like 
outburst. Temperatures for the Great Eruption and the 1890 eruption of n Car 
are based on the echo spectra presented here and the F-type spectrum of the 
1890 event”®, respectively. The temperature of 10,000 K for SN 2009ip is based 
on the observed continuum shape, but this is only a lower limit because of the 
possible effects of circumstellar or host galaxy reddening”’. Because of the 
presence of He! lines in the spectrum, the true temperature is probably much 
hotter. The 8,500 K temperature of UGC2773-OT is indicated by the F-type 
absorption features in its spectrum, and this temperature is relatively 


27,28 


independent of reddening” 


these spectral observations. Alternative models, such as the ones that 
use mass accretion from the companion star during periapsis passage 
as a trigger for the eruption”, can be verified or dismissed. 
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Magnetic reconnection from a multiscale instability 


cascade 


Auna L. Moser! & Paul M. Bellan! 


Magnetic reconnection, the process whereby magnetic field lines 
break and then reconnect to form a different topology, underlies 
critical dynamics of magnetically confined plasmas in both nature’* 
and the laboratory”*. Magnetic reconnection involves localized 
diffusion of the magnetic field across plasma, yet observed reconnec- 
tion rates are typically much higher than can be accounted for using 
classical electrical resistivity’’. It is generally proposed” that the field 
diffusion underlying fast reconnection results instead from some 
combination of non-magnetohydrodynamic processes that become 
important on the ‘microscopic’ scale of the ion Larmor radius or the 
ion skin depth. A recent laboratory experiment’ demonstrated a 
transition from slow to fast magnetic reconnection when a current 
channel narrowed to a microscopic scale, but did not address how a 
macroscopic magnetohydrodynamic system accesses the microscale. 
Recent theoretical models” and numerical simulations'’*™ suggest 
that a macroscopic, two-dimensional magnetohydrodynamic current 
sheet might do this through a sequence of repetitive tearing and 
thinning into two-dimensional magnetized plasma structures having 
successively finer scales. Here we report observations demonstrating 
a cascade of instabilities from a distinct, macroscopic-scale magneto- 
hydrodynamic instability to a distinct, microscopic-scale (ion skin 
depth) instability associated with fast magnetic reconnection. These 
observations resolve the full three-dimensional dynamics and give 
insight into the frequently impulsive nature of reconnection in space 
and laboratory plasmas. 

The experiment (Fig. 1) involves a long, slender, current-carrying 
magnetized plasma jet that evolves over the course of ~50 us. The jet 
front travels at~10kms_ ', increasing the jet length until the current- 
driven kink instability, an ideal magnetohydrodynamic (MHD) 
phenomenon, sets in’ and deforms the plasma jet into a helical struc- 
ture the amplitude of which grows in time (Fig. 2, Supplementary Fig. 1 
and Supplementary Movie 1). The experiments used hydrogen, 
nitrogen or argon plasma. In all three of these, the kink amplitude 
growth rate was observed to have one of two distinct behaviours: linear 
growth or exponential growth (Fig. 2b). Because argon provides 
the clearest images, argon data will be used in the following detailed 
discussion. 

In the case of an exponentially growing kink, the jet segment develops 
a periodic fine structure (Fig. 3). The fine-structure growth rate, loca- 
tion and spatial periodicity are consistent with the magnetized plasma 
Rayleigh-Taylor instability’®. This instability develops in a gravitational 
field at an interface where a heavy fluid with density p2 lies above a light 
fluid with density p,. The acceleration of the exponentially growing 
kink segment creates an effective gravitational field, g.r, in the plasma 
frame. To an observer in the frame of the accelerating filamentary kink 
segment at the location of the periodic structure (on the inward side of 
the outward accelerating filament, that is, the trailing side), the plasma 
filament appears to bea heavy fluid sitting on top of trailing low-density 
fluid immediately exterior to the filament. 

The fastest-growing mode of the magnetized plasma Rayleigh- 
Taylor instability has keB = 0 (where k is the instability wavevector 
and B is the magnetic field vector), a property that will be useful in the 


interpretation of the experiment. If we assume that the plasma density 
inside the filament greatly exceeds the density immediately outside, that 
is, 02 >> p), then the exponential growth rate for the fastest-growing 
Rayleigh-Taylor mode is’® ypp~ \/Serrk, where k = 2n/A and 4 is the 
wavelength of the fine structure. For the plasma shown in Fig. 3, the 
calculated growth rate is yyy ~ 3 X 10°ms ‘andthe visually observed 
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Figure 1 | Geometry of experimental set-up. The experiment produces a 
magnetized plasma jet using the coplanar disk and annulus electrodes and the 
background poloidal magnetic field coil shown in cut-away view. The 20-cm- 
diameter cathode (inner electrode) and the surrounding 50-cm-outer diameter 
anode (outer electrode) are mounted on the end dome ofa 1.4-m-diameter, 1.6- 
m-long stainless steel vacuum chamber. To make the magnetized plasma jet, 
first the coil creates a poloidal magnetic field linking the disk to the annulus. 
Fast pulsed valves then puff in gas at eight locations in front of the anode and 
eight locations in front of the cathode. A high-energy capacitor bank then 
applies a potential of 5-6 kV across the electrodes, breaking the gas down into a 
plasma. The electric current from the capacitor bank ramps up to a peak 
amplitude of ~110kA, with a full-width at half-maximum of ~60 pts. The 
electric current flows along the jet column, producing a ~0.1-T toroidal (¢ 
direction) magnetic field, and completes its path along an outer, low-density, 
low-poloidal-magnetic-field shroud surrounding the jet. The jet is propelled in 
the z direction by the combination of the axial gradient of the toroidal magnetic 
field and the axial gradient of the hydrodynamic pressure resulting from the 
radial pinch force due to the current”. Diagnostic devices include a visible-light 
fast framing camera with a 20-ns shutter and an adjustable ~1-1s interframe 
time, a gated, 12-channel linear spectroscopic array” (lines of sight indicated in 
red) with a 2-cm line-of-sight separation and a 1-,1s time resolution, a 
capacitively coupled probe and extreme-ultraviolet (EUV) diodes sensitive 
within the 10-75-eV energy range. 


1Applied Physics, California Institute of Technology, Pasadena, California 91125, USA. 
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growth rate is yypr ~ 10°ms |. The agreement between these two rates 
and the observation that the instability is located on the trailing side of 
the transversely accelerated filament together support the conclusion 
that the fine-scale instability is a Rayleigh-Taylor instability. 


Rayleigh-Taylor 
instability 


Figure 3 | Time series images of plasma jet evolution. For exponential kink 
amplitude growth, shown here, a segment of the kinked jet quickly narrows to a 
thin filament, which then brightens while developing a sharp, distinctive, 
periodic fine structure on the trailing side of the radially outward-accelerating 
filament. As the fine-structure amplitude grows, it erodes the filament diameter 
until the filament breaks up. The dashed line in the first image shows the 
position of the inner electrode. The measured transverse acceleration of the 


filament when the periodic fine structure first appears is gog~ 4 X 10'°ms 7. 
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Figure 2 | Growth of the amplitude of the kink instability. a, Example of 
measurement of the kink amplitude, €, in a fast framing camera image of shot 
no. 10,934. The plasma jet undergoes a kink instability in agreement with the 
prediction of Kruskal-Shafranov theory'**: the jet is unstable to kinking when 
Hol/ '¥ > 4n/L, where L is the jet length, I is the current measured at the 
electrodes and ¥ is the poloidal magnetic flux measured at the electrodes. 
Image intensity is logarithmically scaled and false coloured. b, Comparison of 
exponential kink growth rate and linear kink growth rate as measured from fast 
camera images. Black points indicate kink amplitude with exponential growth, 
€ x exp()inié), Where Yxink = 8.3 X 10° s_' (shot no. 10,934). Brown points 
indicate kink amplitude with linear growth, € « t (shot no. 10,930). Lines show 
best fits to the data. Error bars indicate the range of possible plasma edge 
positions based on image intensity. 


When the thin filament breaks up as a result of the Rayleigh-Taylor 
instability, the portion of the jet beyond the break-up region retains its 
magnetic structure and separates from the remaining jet base (Fig. 3 
and Supplementary Fig. 2), demonstrating a clear magnetic reconnec- 
tion. Several additional diagnostics support the conclusion that 
magnetic reconnection is occurring. Photodiodes sensitive within 
the 10-75-eV energy range measure a burst of extreme ultraviolet 
radiation coincident with the filament break-up (Supplementary Fig. 3). 
A capacitively coupled probe placed in the plasma jet between the 
electrodes and the filament measures an order of magnitude increase 
in emissions in the whistler-wave frequency range coincident with the 
filament break-up’” (Supplementary Fig. 4). 

We can verify that the reconnecting plasma is indeed at the non- 
MHD microscale by recalling that ideal MHD is based on the 
presumption that vg/va <1. Here vq=J,/nq is the electron drift 
velocity for an axial current density J,, where n is particle density 
and q is electron charge; and va = B/,/ionm; is the Alfvén velocity, 
where B is the magnetic field strength, m; is the ion mass and jug is the 
permeability of free space. When vg/va becomes of order one, the 
assumptions underlying ideal MHD fail because kinetic effects (that 
is, wave—particle interaction) and Hall term effects (that is, decoupling 
of electron and ion perpendicular motions) become important. If we 
assume that B ~ B, and use keB = 0 (which holds for the fastest-grow- 
ing Rayleigh-Taylor mode), we see that 


Kink instability 


The axial wavelength, as measured from the fast camera images, is 1, ~ 2.cm, 
which implies that k ~ 300m’. These measurements give a calculated 
Rayleigh-Taylor growth rate of ypy ~ 3 X 10°s '. For comparison, the growth 
rate can be estimated directly from the fast camera images. The fine-structure 
amplitude is ~0.5 cm when it first appears and grows to ~1.5cm in 1 pls. This 
corresponds to an observed growth rate of yyy ~ 1 X 10°s ', inagreement with 
the calculated growth rate. All images are from shot no. 11,225; image intensity 
is logarithmically scaled and false coloured. See also Supplementary Movie 2. 
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B, rae OB ~ ror (« an) B, 
where A, is the local axial wavelength. This means that 
va/Va =(4n/A-)(c/@pi), where c is the speed of light and «,; is the 
ion plasma frequency (c/q,; is the ion skin depth). 

Using c/«,; calculated with n ~ 10°* mas determined from Stark- 
broadening density measurements and the measured fine-structure 
local axial wavelength, 2,=2cm, we find that vg/v4 ~ O(1). This 
shows that the filament observed to have a Rayleigh-Taylor instability 
is in the non-MHD regime at the time of magnetic reconnection. 

Because the original kink instability was in the ideal-MHD regime, 
our argon plasma observations thus show a cascade from an ideal- 
MHD macroscopic kink instability to a non-ideal, microscopic 
Rayleigh-Taylor instability associated with magnetic reconnection. 
A question that remains to be answered is what determines whether 
the kink instability grows linearly or exponentially. 

By contrast with its behaviour in argon, the Rayleigh-Taylor instab- 
ility in hydrogen fails to break the plasma filament. Although we still 
observe an exponentially growing kink and subsequent Rayleigh- 
Taylor instability in hydrogen, the ion skin depth, c/c,;, is substantially 
smaller than in argon and the Rayleigh-Taylor instability fails to erode 
the plasma filament to a diameter less than this smaller depth 
(Supplementary Fig. 5), though we are not yet certain why. Thus, 
the hydrogen plasma diameter has not been reduced to the necessary 
microscale and no magnetic reconnection takes place. 

Reconnection is observed in the nominal parameter regime of these 
experiments only when preceded by the Rayleigh-Taylor instability; this 
implies that the Rayleigh-Taylor instability is necessary for the observed 
reconnection to occur. However, the comparison between argon and 
hydrogen plasmas shows that the mere existence of the Rayleigh- 
Taylor instability is not sufficient for reconnection: the Rayleigh- 
Taylor instability must become large enough to erode the filament 
diameter to less than the ion skin depth for magnetic reconnection to 
occur. 

Our observations demonstrate one possible mechanism by which a 
macroscopic MHD system can couple to the microscale processes 
necessary for magnetic reconnection. The essential components of this 
mechanism have been separately observed in nature. For example, in 
the solar corona current-carrying magnetic flux tubes confining 
plasma with density higher than the ambient value are common, as 
is kinking of such flux tubes'*”’, and Rayleigh-Taylor instabilities have 
been observed”. 

Given that we observe this mechanism in a range of laboratory experi- 
ments, we think it quite plausible that it will also occur in astrophysical 
systems with appropriate physical parameters. As a possible example, we 
note that one solar observation”' reported the lateral acceleration of a 
plasma-filled flux tube at ~1 kms *. Assuming that such acceleration is 
typical, for a 10*-km-wide solar loop a Rayleigh-Taylor disturbance with 
an axial wavelength of 400 km and an initial amplitude of 0.5 km would 
grow exponentially to the 10*-km loop width in about 1 min and so 
erode the current channel to a microscopic scale. Because the instability 
wavelength assumed in this example is at the margin of existing 
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resolution capabilities, measurements that simultaneously resolve the 
widely separated macro- and microscales would be challenging. 
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Realization of three-qubit quantum error correction 
with superconducting circuits 


M. D. Reed", L. DiCarlo’, S. E. Nigg', L. Sun!, L. Frunzio', S. M. Girvin' & R. J. Schoelkopf! 


Quantum computers could be used to solve certain problems 
exponentially faster than classical computers, but are challen- 
ging to build because of their increased susceptibility to errors. 
However, it is possible to detect and correct errors without destroy- 
ing coherence, by using quantum error correcting codes’. The 
simplest of these are three-quantum-bit (three-qubit) codes, which 
map a one-qubit state to an entangled three-qubit state; they can 
correct any single phase-flip or bit-flip error on one of the three 
qubits, depending on the code used’. Here we demonstrate such 
phase- and bit-flip error correcting codes in a superconducting 
circuit. We encode a quantum state**, induce errors on the qubits 
and decode the error syndrome—a quantum state indicating which 
error has occurred—by reversing the encoding process. This 
syndrome is then used as the input to a three-qubit gate that 
corrects the primary qubit if it was flipped. As the code can recover 
from a single error on any qubit, the fidelity of this process should 
decrease only quadratically with error probability. We imple- 
ment the correcting three-qubit gate (known as a conditional- 
conditional NOT, or Toffoli, gate) in 63 nanoseconds, using an 
interaction with the third excited state of a single qubit. We find 
85 + 1 per cent fidelity to the expected classical action of this gate, 
and 78 + 1 per cent fidelity to the ideal quantum process matrix. 
Using this gate, we perform a single pass of both quantum bit- and 
phase-flip error correction and demonstrate the predicted first- 
order insensitivity to errors. Concatenation of these two codes in 
a nine-qubit device would correct arbitrary single-qubit errors. In 
combination with recent advances in superconducting qubit 
coherence times”®, this could lead to scalable quantum technology. 

Quantum error correction relies on detecting the presence of errors 
without gaining knowledge of the encoded quantum state. In the three- 
qubit error-correcting code, the subspace of the two additional ‘ancilla’ 
qubits uniquely encodes which of the four possible single-qubit errors 
has occurred, including the possibility of no flip. Crucially, errors 
consisting of finite rotations can also be corrected using these schemes 
because the error syndromes are allowed to be in superpositions of the 
possible outcomes, flipped and not flipped’. Previous works imple- 
menting error correcting codes in liquid-’° and solid-state'’ NMR 
and with trapped ions'’? have demonstrated two possible strategies 
for using the error syndromes. The first is to measure the ancillas 
(thereby projecting the syndrome) and use a classical logic operation 
to correct the detected error. This ‘feed-forward’ capability is challen- 
ging in superconducting circuits as it requires a fast and high-fidelity 
quantum non-demolition measurement, but is probably a necessary 
component to achieve scalable fault tolerance”’*. The second strategy, 
as recently demonstrated with trapped ions’? and used here, is to 
replace the classical logic with a quantum controlled-controlled 
NOT (CCNOT) gate that performs the correction coherently, leaving 
the entropy associated with the error in the ancilla qubits, which can 
then be reset’* if the code is to be repeated. The CCNOT gate performs 
exactly the action that would follow the measurement in the first 
scheme: flipping the primary qubit if and only if the ancillas encode 
the associated error syndrome. 


The CCNOT gate is also vital for a wide variety of applications such 
as Shor’s factoring algorithm’ and has attracted substantial experi- 
mental interest, with recent implementations in linear optics’®, 
trapped ions’’ and superconducting circuits’*'’. Here we use the 
circuit quantum electrodynamics architecture” to couple four super- 
conducting transmon qubits”' to a single microwave cavity bus”, 
where each qubit transition frequency can be controlled on nano- 
second timescales with individual flux bias lines** and collectively 
measured by interrogating transmission through the cavity**. (The 
details of the device can be found in Methods Summary and in 
ref. 3.) The frequencies of the qubits, labelled Q;-Qy,, are tuned 
respectively to 6, 7, 7.85 and ~13 GHz, with Q, unused. In this 
Letter, we first demonstrate the three-qubit interaction used in the 
gate, which is an extension of interactions used in previous two-qubit 
gates*****, and show how this interaction can be used to create the 
desired CCNOT gate. We then verify its action and use it to demon- 
strate error correction for an error on a single qubit with the bit-flip 
code and then for simultaneous errors on all three qubits with the 
phase-flip code. We find a quadratic dependence of process fidelity 
on error probability, indicating that the algorithm is correcting errors 
as predicted. 

Our three-qubit gate uses an interaction with the third excited state 
of one transmon. Specifically, it relies on the unique capability among 
computational states (eigenstates of the Pauli operator Z) of |111) to 
interact with the non-computational state |003) (the notation |abc) 
refers to the excitation levels of Q\-Q3;, respectively). As the direct 
interaction of these states is prohibited to first order, we first transfer 
the quantum amplitude of |111) to the intermediate state |102), which 
itself couples strongly to |003). Calculated energy levels and time- 
domain data showing interaction between |011) and |002) (which is 
identical to that between |111) and |102) except for a 6-GHz offset) as a 
function of the flux bias on Q, are shown in Fig. la. Once the ampli- 
tude of |111) has been transferred to |102) with a sudden swap inter- 
action, a three-qubit phase is acquired by moving Q, up in frequency 
adiabatically, near the avoided crossing with |003). Figure 1b shows the 
avoided crossing between these states as a function of the flux bias on 
Q,. This crossing shifts the frequency of |102) relative to the sum of the 
frequencies of |100) and |002) to yield the three-qubit phase. The 
detailed procedure of the gate is shown in Fig. 2a, and is implemented 
in 63 ns. Further details can be found in Supplementary Information. 

We demonstrate the gate by first measuring its classical action. The 
controlled-controlled phase (CCPhase) gate, which maps |111) to 
—|111), has no effect on pure computational states so we implement 
a CCNOT gate by concatenating pre- and post-gate rotations on Q,, as 
indicated in the unshaded regions of Fig. 2a. Such a gate ideally swaps 
|101) and |111) and does nothing to the remaining states. To verify 
this, we prepare the eight computational states, implement the gate and 
measure its output using three-qubit state tomography’ to generate 
the classical truth table. The intended state is reached with 85 + 1% 
fidelity on average. This measurement is sensitive only to classical 
action, however, so we complete our verification by performing full 
quantum process tomography on the CCPhase gate, which can detect 
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Figure 1 | Calculated energy spectra and time-domain measurements of the 
interactions used in the three-qubit gate. a, The energy spectrum of doubly 
excited states demonstrating the avoided crossing between |011) and |002) 
(identical to that between |111) and | 102) except for a 6-GHz offset) is shown 
with both a numerical diagonalization of the system Hamiltonian (top) and a 
time-domain measurement as a function of the applied magnetic flux on Q) 
(bottom). Top: the frequencies for the involved eigenstates are blue and the 
non-interacting eigenstates of similar energy are grey. The notation |abc) ® |d) 
indicates the excitation level of each qubit and the cavity photon number, 
respectively. When the second ket is omitted, d = 0. Bottom: the state |011) is 
prepared and a square flux pulse of duration t and amplitude V, is applied. 
Coherent oscillations produce a ‘chevron’ pattern, with darker colours 
corresponding to population left in |002). h, Planck’s constant. b, The spectrum 
of triply excited states showing the avoided crossing between | 102) and |003) as 
a function of the flux bias on Q, is characterized in the same way as above. The 
state |102) is prepared by first making |111) and then performing the swap as 
described in Fig. 2. Many additional eigenstates are close in energy but are 
irrelevant because they do not interact with the populated states. The large 
avoided crossing between the relevant eigenstates that is used to produce an 
adiabatic three-qubit interaction happens near 28 m®y (where ®p is the 
magnetic flux quantum). Extra lines near 31 m®py and 29 m@)y are due to 
higher-order interactions predicted by the Hamiltonian (|102) with |030) and 
|003) with |111)), as is the larger first-order interaction at 25 m®, (|102) witha 
hybridization of |021) and |111)), but their effect on the protocol in Fig. 2 is 
negligible. 


the evolution of quantum superpositions of computational states. This 
is done by preparing 64 input states that span the computational 
Hilbert space and by performing state tomography on the result of 
the gate’s action on each state. As detailed in Supplementary Informa- 
tion, we find a fidelity of 78 + 1% to a process in which the spurious 
two-qubit phase between Q, and Q; is set to the independently 
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Figure 2 | Pulse sequence and classical action of the three-qubit gate. a, The 
frequencies of the three qubits and the locations of applied rotations during the 
three-qubit gate as functions of time. Shaded region: to produce the CCPhase 
interaction, Q, is first moved suddenly into resonance with the avoided 
crossing shown in Fig. 1a, which coherently transfers the population of |111) to 
| 102) (and also that of |011) to |002)) in 7 ns. Fine adjustments in the first point 
of the pulse compensate for finite pulse rise time and temporal precision. The 
frequency of Q, is then abruptly increased to where its two-qubit phase with Q, 
is cancelled during the gate by accumulating a multiple of 27. The frequency of 
Q, is then increased adiabatically to initiate the interaction between | 102) and 
|003). The duration and amplitude of this excursion is tuned to acquire a three- 
qubit phase of m. The population in |102) is then transferred back to |111) by 
reversing the swap procedure. Finally, the two-qubit phase between Q, and Q, 
is cancelled with an additional adiabatic interaction, which is sped up with a 
m-pulse on Q; at 37 ns (all rotations here are done about the x axis). The two- 
qubit phase between Q, and Q; is uncontrolled and there is an overall 
n-rotation of Q,, making this a n-CC-e'"Z gate, taking a total of 63 ns. 
Unshaded region: pre- and post-gate rotations on Q, appended to the CCPhase 
gate turn its action into that of a CCNOT gate, as described in Supplementary 
Information. b, The classical action of the CCNOT gate is measured by 
preparing the eight computational basis states, |\/;,), and performing state 
tomography on the resulting state, |/ou), after applying the gate, O, to them. 
The projection of these measurements to the computational basis states is taken 
to generate the displayed truth table. The fidelity to the expected action, where 
only the states |101) and | 111) are swapped, is 85 + 1%. Full quantum process 
tomography of the gate is shown in Supplementary Information. 


measured value of 57° (see Supplementary Information for an explana- 
tion of why this phase is irrelevant here). Owing to this extraneous 
phase, ¢, the gate is most accurately described as a CC-e?Z gate. The 
loss of fidelity is consistent with the expected energy relaxation of the 
three qubits during the 85-ns tomography procedure, which includes 
preparation and analysis pulses in addition to the gate, with some 
remaining error due to qubit transition frequency drift during the 
90 min it takes to collect the full data set. 

With our CCPhase gate in hand, we now demonstrate three-qubit 
error correction. We first examine the bit-flip code, which, as shown in 
Fig. 3a, starts by encoding the quantum state to be protected in a three- 
qubit entangled state* through the use of conditional phase (CPhase) 
gates. The state «|0) + f|1) is encoded as «|000) + $|111), which has 
the property that the value of any two-qubit ZZ product is + 1 regardless 
of the values of « and f. (For quantum states on the equator of the 
Bloch sphere, |x| =|6|=1/ V2 and the encoding is a maximally 
entangled three-qubit Greenberger-Horne-Zeilinger state***° that 
we independently measure to have a state fidelity of 89 + 1%.) If any 
single qubit is flipped, one or more of the ZZ products will flip sign as 
well. For example, if Q; were flipped, the Z,Z product would become 
—1 whereas the Z,Z3 product would remain +1, uniquely indicating 
that Q; needs to be corrected. Indeed, the four possible combinations 
of Z,Z, and Z,Z; exactly encode the possible single bit flips, including 
the possibility of no flip. In a fault-tolerant code, these products would 
be stored in separate qubits for later measurement’, but here we instead 
reverse the encoding so that the ancillas Q, and Q; can no longer 
witness bit-flip errors and instead store the values of the two ZZ pro- 
ducts. These ancillas are then used as the control bits for the CCNOT 
gate described above, so that Q, will be flipped back if and only if both 
ancillas are excited, which indicates that Q, was flipped. The detailed 
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evolution of the qubits during the error correction procedure can be 
found in Supplementary Information. 

Whereas errors on classical bits are discrete, quantum error correc- 
tion must be able to correct arbitrary rotations as well as complete 
flips because superpositions of states are allowed. Remarkably, 
the code described above already satisfies this criterion. If an error 
causes a rotation 0 on Q:, the quantum state after decoding will 
be /1=B(al0) + b|1))@|00) + /P(Bl0) +21))@|11), where p = 
sin’(0/2) is the effective probability of a full flip and where we have 
listed first the state of Q, followed by those of Q; and Q; for notational 
simplicity. That is, the state will be a superposition of Q, in the correct 
state with the ancillas indicating no error plus Q, flipped with the 
ancillas indicating as such. The application of the CCNOT gate to 
this state will successfully correct it because it acts only on the 
subspace where both ancillas are excited, making the state 
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Figure 3 | Bit-flip error correction demonstrating recovery from a single 
arbitrary rotation. a, The error correction protocol starts by encoding the 
quantum state to be protected in a three-qubit state by entangling the two ancilla 
qubits, Q; and Q3,with Q, through the use of two CPhase gates (vertical lines 
terminating in solid circles) and m/2-rotations (Rj? is a single-qubit rotation, 
where 71 indicates the rotation axis and « is the rotation angle). The number 
adjacent to each CPhase gate indicates which state receives a phase shift of m. A 
single y-rotation error of a known angle is then performed on a single qubit (as 
explained in Supplementary Information, this is compiled together with other 
rotations when acting on the ancillas). The state is then decoded, leaving the 
ancillas in a product state indicating which single-qubit error occurred. For finite 
rotations, the ancillas will be in a superposition of states in which the error did 
and, respectively, did not occur, with each tensor multiplied with the associated 
single-qubit state of Q,. If an error occurred on Q,, the CCNOT gate 
implemented with our CCPhase gate (represented by three solid circles linked by 
a vertical line) at the end of the code will correct it. We then perform three-qubit 
state tomography to verify the result. b, State fidelity to the created state 

|) = | +X) after causing an error on one of the qubits, with and without error 
correction. Ideally, the error-corrected curves would be horizontal lines at unit 
fidelity. Finite excited-state lifetimes cause oscillations and lower fidelity because 
errors change the excitation level of the system. c, Two-qubit density matrices (p) 
of the ancillas after each of the four possible full bit-flip errors has occurred. The 
fidelities of these states to the ideal error syndromes, |00), |01), | 10) and | 11), are 
respectively 81.3%, 69.7%, 73.1% and 61.2%. 
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qubits were measured, they would project the state onto one of those 
possibilities, essentially forcing the computer to ‘decide’ whether a full 
flip had occurred.) We demonstrate this with the procedure shown in 
Fig. 3a for the state |) = |+X) (the positive eigenstate of the Pauli 
operator X), performing single deterministic rotations of a known 
angle on each of the three qubits to simulate errors. As shown in 
Fig. 3b, we compare this with the case of uncorrected errors on Qo. 
Ideally, the error-corrected curves would have unit fidelity and be 
independent of 0, but they are slightly lower in fidelity and oscillate 
in 0 owing to finite coherence. They are, however, substantially 
improved relative to the uncorrected case, demonstrating that the 
errors are in fact being ameliorated. As shown in Fig. 3c, we also 
measure the two-qubit density matrix of the ancilla qubits after each 
of the four possible full bit-flip errors, showing that they end up in a 
computational product state correctly indicating the induced error. 
In real physical systems, errors occur at approximately the same rate 
on all constituent qubits rather than on one at a time. The correction 
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Figure 4 | Demonstration of first-order insensitivity to simultaneous phase- 
flip errors. a, The phase-flip error correction protocol differs from the bit-flip 
protocol described in Fig. 3a only by single-qubit gates. Those gates effectively 
rotate the coordinate system, mapping phase flips to bit flips, and vice versa, so 
the remainder of the procedure is exactly the same as in the bit-flip case’. We 
perform errors on all three qubits simultaneously with z gates of known rotation 
angle, which is equivalent to phase-flip errors with probability p = sin?(0/2). As 
with the bit-flip code, if a single error has occurred on the primary qubit, the 
CCNOT gate at the end of the code will undo it. b, Fidelity of the process matrix 
of the protected qubit to the identity operation plotted as a function of p. As the 
code corrects only single errors, it will fail on the three-qubit subspace where 
more than one has occurred, which happens with a probability 3p* — 2p°. The 
coefficients here are reduced for processes with finite fidelity. The process 
fidelity is fitted with f = (0.760 + 0.005) — (1.46 + 0.03)p” + (0.72 + 0.03)p*. If 
a linear term is allowed, its best-fit coefficient is 0.03 + 0.06. We compare this 
with the case of no error correction to simulate the improvement made when 
the decoherence of Q, is normalized away (blue symbols). We also plot the 
simulated fidelity of a perfect but non-corrected system (black dashed line), 
which indicates that for our gate fidelities we do not show a net improvement for 
artificial errors. Insets: the constituent state fidelities of the four basis states used 
to produce the process fidelity data in the case with error correction (right) and 
in the case with no correction (left). The x axes of the plots are the same as the 
main panel, and they share the same y axis. The state | + Y) (the positive 
eigenstate of the Pauli operator Y) is immune to errors because its encoded state 
is an eigenstate of single, double and triple qubit phase flips. 


©2012 Macmillan Publishers Limited. All rights reserved 


scheme will only succeed, therefore, on the three-qubit subspace with 
zero or one errors. The probability of more than one error occurring is 
3p" _ 2p’, where p is the single-qubit error rate’, so the fidelity of error 
correction should be 1 — 3p” + 2p*. For a scheme with gate fidelity 
limited by decoherence, the coefficients of the quadratic and cubic 
terms will be smaller but, crucially, any linear dependence on p will 
be strongly suppressed. If the error rates for each qubit were different, 
these coefficients would again be modified but any linear dependence 
would still be abated. For the sake of completeness, here we use the 
phase-flip code, which differs from the previously discussed bit-flip 
code by only single-qubit rotations, as shown in Fig. 4a. This difference 
can be viewed as a rotation of the coordinate system, converting phase 
flips to bit flips and vice versa, so the remainder of the code is exactly 
the same as the previous case”'*””. Phase errors of known rotation 
angle are applied by rotating the frame of reference of subsequent x 
and y rotations. As shown in Fig. 4b, we measure the process fidelity of 
the error correction scheme as a function of p and compare this with 
the case of no error correction in which identical single-qubit rotations 
are applied to Q, but the ancillas are not involved (this comparison is 
without gates, but with appropriate delays to have the same total 
procedure duration, to indicate the lack of fidelity due to the decoher- 
ence of Q,). Whereas without error correction we find a purely linear 
dependence on p, with the correction applied the data are extremely 
well modelled by only quadratic and cubic terms, demonstrating the 
desired first-order insensitivity to errors. We have therefore realized a 
successful implementation of quantum error correction, although 
improvement of the fidelity of a real physical process will require 
considerable advances in both gate fidelity and device complexity. 

We have realized both bit- and phase-flip error correction in a 
superconducting circuit. In doing so, we have tested both main 
conceptual components of the nine-qubit Shor code’, which can 
defend against arbitrary single-qubit errors by concatenating the bit- 
and phase-flip codes. The implementation relies on our efficient three- 
qubit gate, which uses non-computational states in the third excitation 
manifold of our system, demonstrating that the simple Hamiltonian of 
the system accurately predicts the dynamics even at these high excita- 
tion levels. The gate takes approximately half the time of an equivalent 
construction with one- and two-qubit gates. We expect it to work 
between any three nearest-neighbour qubits in frequency regardless 
of the number of qubits sharing the bus, as interactions involving other 
qubits will be first-order prohibited. 


METHODS 


Arbitrary qubit rotations around the x and y axes of the Bloch sphere are per- 
formed with pulse-shaped resonant microwave tones. Rotations around the z axis 
are made by rotating the reference phase of subsequent x and y pulses. One-qubit 
dynamical phases resulting from flux excursions are measured with modified 
Ramsey experiments comparing the phase acquired by an unmodified prepared 
state with the phase acquired by that same state after a flux pulse, and are cancelled 
with z rotations. Two- and three-qubit phases are measured with a similar Ramsey 
experiment comparing the phase acquired when a control qubit is in its ground 
state with the phase acquired when it is in an excited state. For example, the two- 
qubit phase between Q, and Q; is measured by preparing Q; along the y axis and 
Q either in its ground or excited state and then performing the flux pulse in both 
cases. The single-qubit phase of Q; is the same for both states, so the two-qubit 
phase is directly measurable as the difference in phase between them. All phases 
are initially tuned to within 1°, limited by the resolution of control equipment and 
drifts of system parameters such as the qubit transition frequencies. 


Received 21 September; accepted 7 December 2011. 
Published online 1 February 2012. 


1. Shor, P. W. Scheme for reducing decoherence in quantum computer memory. 
Phys. Rev. A 52, R2493-R2496 (1995). 


LETTER 


2. Nielsen, M. A. & Chuang, |. L. Quantum Computation and Quantum Information 
(Cambridge Ser. Inf. Nat. Sci, Cambridge Univ. Press, 2000). 

3. DiCarlo, L. et a/. Preparation and measurement of three-qubit entanglement in a 
superconducting circuit. Nature 467, 574-578 (2010). 

4. Neeley, M. etal. Generation of three-qubit entangled states using superconducting 
phase qubits. Nature 467, 570-573 (2010). 

5. Paik, H. et al. Observation of high coherence in Josephson junction qubits 
measured in a three-dimensional circuit QED architecture. Phys. Rev. Lett. 107, 
240501 (2011). 

6. Kim, Z. etal. Decoupling a Cooper-pair box to enhance the lifetime to 0.2 ms. Phys. 
Rev. Lett. 106, 120501 (2011). 

7. Cory, D. et a/. Experimental quantum error correction. Phys. Rev. Lett. 81, 
2152-2155 (1998). 

8. Knill, E., Laflamme, R., Martinez, R. & Negrevergne, C. Benchmarking quantum 
computers: the five-qubit error correcting code. Phys. Rev. Lett. 86, 5811-5814 
(2001). 

9. Boulant, N., Viola, L., Fortunato, E. & Cory, D. Experimental implementation of a 
concatenated quantum error-correcting code. Phys. Rev. Lett. 94, 130501 
(2005). 

0. Moussa, O., Baugh, J., Ryan, C. A. & Laflamme, R. Demonstration of sufficient 

control for two rounds of quantum error correction in a solid state ensemble 

quantum information processor. Phys. Rev. Lett 107, 160501 (2011). 

1. Chiaverini, J. et al. Realization of quantum error correction. Nature 432, 602-605 

(2004). 

Schindler, P. et a/. Experimental repetitive quantum error correction. Science 332, 

1059-1061 (2011). 

3. Shor, P. W.in Proc. 37th Symp. Foundations Comput. 56-65 (IEEE, 1996); preprint 
a 


4. Reed, M. D. et a/. Fast reset and suppressing spontaneous emission of a 

superconducting qubit. Appl. Phys. Lett. 96, 203110 (2010). 

5. Shor, P. W. Polynomial-time algorithms for prime factorization and discrete 

ogarithms on a quantum computer. S/AM J. Sci. Statist. Comput. 26, 1484-1509 

(1997). 

6. Lanyon, B. P. et al. Simplifying quantum logic using higher-dimensional Hilbert 

spaces. Nature Phys. 5, 134-140 (2009). 

7. Moniz, T. etal. Realization of the quantum Toffoli gate with trapped ions. Phys. Rev. 

Lett. 102, 040501 (2009). 

8. Mariantoni, M. et al. Implementing the quantum von Neumann architecture with 

superconducting circuits. Science 334, 61-65 (2011). 
9. Fedorov, A., Steffen, L., Baur, M. & Wallraff, A. Implementation of a Toffoli gate with 
superconducting circuits. Nature 481, 170-172 (2012). 

20. Wallraff, A. et al. Strong coupling of a single photon to a superconducting qubit 
using circuit quantum electrodynamics. Nature 431, 162-167 (2004). 

21. Schreier, J. A. et al. Suppressing charge noise decoherence in superconducting 
charge qubits. Phys. Rev. B 77, 180502 (2008). 

22. Majer, J. et al. Coupling superconducting qubits via a cavity bus. Nature 449, 
443-447 (2007). 

23. DiCarlo, L. et al. Demonstration of two-qubit algorithms with a superconducting 
quantum processor. Nature 460, 240-244 (2009). 

24. Reed, M. et al. High-fidelity readout in circuit quantum electrodynamics using the 
Jaynes-Cummings nonlinearity. Phys. Rev. Lett 105, 173601 (2010). 

25. Strauch, F. et al. Quantum logic gates for coupled superconducting phase qubits. 
Phys. Rev. Lett. 91, 167005 (2003). 

26. Greenberger, D. M., Horne, M.A. & Zeilinger, A. in Bell’s Theorem, Quantum Theory 
and Conceptions of the Universe (ed. Kafatos, M.) 69-72 (Kluwer Academic, 
1989). 

27. Tornberg, L., Wallquist, M., Johansson, G., Shumeiko, V. & Wendin, G. 

Implementation of the three-qubit phase-flip error correction code with 

superconducting qubits. Phys. Rev. B 77, 214528 (2008). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements We thank G. Kirchmair, M. Mirrahimi, |. Chuang and M. Devoret for 
discussions. We acknowledge support from LPS/NSA under ARO contract no. 
W911NF-09-1-0514 and from the NSF under grants no. DMR-0653377 and no. 
DMR-1004406. Additional support was provided by CNR-Istituto di Cibernetica, 
Pozzuoli, Italy (L.F.), the Swiss NSF (S.E.N.) and the Dutch NWO (L.D.C.). 


Author Contributions M.D.R. carried out measurements and performed data analysis. 
L.D.C. designed the three-qubit gate and conducted initial measurements. L.S. 
provided further experimental contributions. S.E.N. and S.M.G. provided theoretical 
support. L.F., L.D.C. and LS. fabricated the devices. M.D.R. wrote the manuscript, with 
feedback from all authors. S.M.G. and RJ.S. designed and supervised the project. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to M.D.R. (matthew.reed@yale.edu) or RJ.S. (robert.schoelkopf@yale.edu). 


16 FEBRUARY 2012 | VOL 482 | NATURE | 385 


©2012 Macmillan Publishers Limited. All rights reserved 


[TER 


doi:10.1038/nature10749 


Origin of Columbia River flood basalt controlled by 
propagating rupture of the Farallon slab 


Lijun Liu’ & Dave R. Stegman! 


The origin of the Steens-Columbia River (SCR) flood basalts, which 
is presumed to be the onset of Yellowstone volcanism, has remained 
controversial, with the proposed conceptual models involving 
either a mantle plume’~* or back-arc processes®*. Recent tomo- 
graphic inversions based on the USArray data reveal unprecedented 
detail of upper-mantle structures of the western USA’ and tightly 
constrain geodynamic models simulating Farallon subduction, 
which has been proposed to influence the Yellowstone volcanism”*. 
Here we show that the best-fitting geodynamic model’ depicts an 
episode of slab tearing about 17 million years ago under eastern 
Oregon, where an associated sub-slab asthenospheric upwelling 
thermally erodes the Farallon slab, leading to formation of a slab 
gap at shallow depth. Driven by a gradient of dynamic pressure, the 
tear ruptured quickly north and south and within about two million 
years covering a distance of around 900 kilometres along all of 
eastern Oregon and northern Nevada. This tear would be consistent 
with the occurrence of major volcanic dikes during the SCR- 
Northern Nevada Rift flood basalt event both in space and time. 
The model predicts a petrogenetic sequence for the flood basalt with 
sources of melt starting from the base of the slab, at first remelting 
oceanic lithosphere and then evolving upwards, ending with remelt- 
ing of oceanic crust. Such a progression helps to reconcile the exist- 
ing controversies on the interpretation of SCR geochemistry and the 
involvement of the putative Yellowstone plume. Our study suggests 
a new mechanism for the formation of large igneous provinces. 

The SCR igneous province of the Pacific Northwest represents one of 
the largest continental flood basalt events, with a total eruption volume 
of around 230,000 km* over approximately two million years (Myr) 
(ref. 3). This massive, fast eruption seems to favour a mantle plume 
origin’ *, but a plume model cannot address why most SCR flood basalt 
erupted in a north-south-oriented region perpendicular to the sub- 
sequent Yellowstone hotspot track along the eastern Snake River plain 
(Fig. 1). Recent models trying to explain this complexity include 
spreading of the plume head along a lithospheric gradient* and lateral 
deflection of the plume conduit due to mantle flow®. Other workers, 
however, dismissed the plume hypothesis as conjecture, and argued 
that the SCR event could have been a result of shallow-mantle pro- 
cesses, such as back-arc extension® or small-scale convection’ or litho- 
sphere delamination®. These conceptual models are all based on some 
aspects of surface geologic features, but the implied underlying mantle 
dynamics differ significantly from each other. A better understanding 
of the SCR flood basalt formation, therefore, requires an improved 
knowledge of mantle processes during the mid-Miocene epoch (about 
16 Myr ago), especially given that Farallon subduction adequately 
explains both the observed surface plate kinematics and continental 
deformation within the western United States. 

A promising way to infer past mantle dynamics is by predicting its 
present-day state through geodynamic modelling using a technique 
called data assimilation, which can be either sequential’! or vari- 
ational’. Here we adopt the sequential technique and assimilate plate 
motion history, palaeo-seafloor ages and palaeo-geometry of plate 
boundaries into a single geodynamic model’. The model integrates 


from 40 Myr ago to the present, through which we try to predict the 
observed mantle structures beneath the western USA outlined with 
increasing detail by a sequence of tomographic inversions based on 
data from the USArray’ (see Supplementary Fig. 1 for more tomo- 
graphy models). The most robust seismic feature, a 500-km-wide 
columnar fast anomaly extending from 300 to 600km depth below 
Nevada and western Utah, was found to be a segmented piece of the 
Farallon slab that was folded upward along its edges by toroidal mantle 
flows during its descent (Supplementary Fig. 2). Tracking backward in 
time, the initial break-off of this slab happened around 17 Myr ago at 
shallow depth (Figs 1 and 2). This is because the shrinking width of the 

124° -120° 


-116° -112° 


Subduction zone at 
16 Myr ago 

Area of slab tear at 
70 km depth 


Slab edge (70 km) 


Figure 1 | Development of the Farallon slab rupture beneath the western 
USA during the mid-Miocene epoch. Geometry of Farallon subduction at 
different times (corresponding to different colours, ages in Myr as shown) is 
projected onto North America. Both the slab edge (solid lines) and slab gap 
(filled area) are at 70 km depth, outlined by an isotherm (—50 °C relative to 
ambient mantle). Major SCR volcanic dikes following ref. 27 are shown as 
yellow patterns (with ages in Myr shown). Grey arrows indicate the direction of 
age (shown, in Myr) progression of surface eruptions. WSRP, western Snake 
River plain; ESRP, eastern Snake River plain; NNR, northern Nevada rift zone. 
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Figure 2 | Mantle flow associated with the slab tearing at latitude 44 °N 
from 18 to 14 Myr ago. Axes in b-e are the same as those in a. Solid yellow 
outlines represent the high-viscosity North America continent. Inverse black 
triangles mark trench locations with time. Insets are zoom-in plots of the 
subduction zones outlined by the white dashed boxes. The green dashed line 
represents the depth of 110 km, above which melting would occur’. We do not 
explicitly model melt generation because parameterizations of melting processes 
commonly implemented are applicable to melting of ambient mantle material 
and are not relevant to different compositions such as oceanic lithosphere and 
oceanic crust, as in the scenario presented here. Black arrows show mantle 
velocities, and the change in temperature is relative to the ambient mantle. 


oceanic plate’* gradually built up dynamic pressure beneath the middle 
part of the subducting slab, where the increased pressure caused the 
slab to flatten’? (see Fig. 2a for example). Consequently, the flattening 
slab slowed down the forward plate motion, and sped up back-arc 
extension, as predicted by fully dynamic subduction models’*. The 
reduction in plate motion and regionally focused sub-slab dynamic 
pressure generated an extension in the central part of the slab hinge, 
which, combined with thermal erosion due to sub-slab upwelling, 
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caused the weak slab to stretch and eventually break in the middle’’ 
(Figs 2 and 3), similar in some ways to slab tear migrations proposed 
for Mediterranean slabs”. 

To understand the possible surface effects of this slab tear, we pro- 
ject the Farallon subduction system onto the reference frame of North 
America, neglecting the internal deformation of the western USA due 
to Basin and Range extension"®. The initial slab tear occurred as early as 
about 17 Myr ago, which, in a map view, formed a trench-parallel slab 
gap (defined by an isotherm at a depth of 70 km) inside the back-arc 
region beneath southeastern Oregon (Fig. 1). This gap coincides with 
the location of Steens Mountain, which recorded the initial phase of 
eruption during the SCR flood basalt event. The slab tear ruptured 
quickly to north and south along the trench-parallel direction (Fig. 3), 
while at the same time the progressive flattening shifted the slab gap to 
the east (Figs 1 and 2). By 15 Myr ago, the gap occupied all of east 
Oregon, southwest Idaho and north Nevada (Fig. 1). This propagating 
pattern of slab tear correlates with the sequence of major mid-Miocene 
volcanic dikes both in space and time. From south to north with 
decreasing age, the western Snake River plain and Chief Joseph dike 
swarms hosted an increasing amount of magma outpouring, forming 
the Imnaha and Grande Ronde magmatic provinces’, consistent with 
the northward rupturing and widening of the slab gap. Meanwhile, the 
southward slab rupturing with a narrower gap also explains the rapid 
formation (within about 1.5 Myr) of the northern Nevada rift zone’’ 
and the observed limited magma eruption along this region. 
Furthermore, the reduction of trenchward wedge flow inside Oregon 
after 17 Myr ago (Figs 2 and 3) provides a physical explanation for the 
reduced magmatic activities along the Oregon coastal arc since this 
time’’”. About 14 Myr ago, the Farallon slab was largely separated into 
two pieces along the down-dip direction, the landward extent of flat 
slab underplating started to retreat to the west (Figs 1 and 2), and sub- 
slab pressure was largely equilibrated with the mantle wedge, consist- 
ent with termination of the SCR eruption around this time’. 

Another important condition for making flood basalts is sufficient 
subsurface mantle upwelling, which is a salient feature of the mantle 
plume model. We also observe a strong focused mantle upwelling below 
the slab hinge, driven by the excess dynamic pressure beneath the sub- 
ducting plate (Figs 2 and 3). Melting in this setting can reasonably be 
expected, because melts are generated at depths shallower than 110km 
along forced upwellings in oceanic environments'*. This fast upwelling 
flow advectively erodes the slab isotherm, leading to a progressively 
shallower asthenosphere, which suggests that oceanic lithosphere 
would melt first, followed by melting of oceanic crust. The modelled 
slab initially acted as both a thermal and mechanical barrier to upward 
flow and inhibited melting, as can be seen from 18 Myr ago, when the 
upwelling simply flattened the slab (Fig. 2a, inset). Subsequent thermal 
erosion of the oceanic lithosphere (Fig. 2b) would allow melt produc- 
tion under the Steens Mountain and Imnaha provinces (Table 1). By 
16 Myr ago, continued thermal erosion melted away the entire oceanic 
lithosphere (Fig. 2c), where the fast melting of oceanic crust encounter- 
ing the high mantle temperature should have triggered the massive 
magma outpouring of Grande Ronde’. The sustained cooling within 
the upwelling due to progressive melting seems to explain the marked 
decrease in magma volume during the formation of the Imnaha 
province (Table 1) and the step-function change in chemistry at the 
Imnaha-Grande Ronde boundary”. The upward corner flow inside 
the mantle wedge may also have facilitated melt generation, but its 
volume contribution is probably minor, given the much colder environ- 
ment below the overriding plate, an aspect not explicitly represented in 
our convection model (Figs 2 and 3). 

The geochemical data, essential for determining the source rocks, 
have been used both for and against a plume signature in the SCR 
lavas'*?*", Although origins of both major and trace elements are still 
widely debated among existing conceptual models, there are certain 
points of consensus (Table 1). (1) Steens Mountain lavas are derived 
from an ultra-depleted mantle source”, probably “owing to repetitive 
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Figure 3 | Three-dimensional view of the tearing slab. a, Top view of an 
isothermal slab surface (—100 °C relative to ambient temperature) and the 
velocity field. The velocity vectors are on the same spherical plane across all 
surface plates. The white dashed oval outlines a region where asthenosphere 
upwelling occurred. We note that the upwellings are restricted to below the 


extraction of trace-element magmas”. (2) The Grande Ronde lavas 
have the highest silica content of all SCR lavas”’, and are best explained 
by melting of a mafic crust’’. (3) The SCR formations experienced a 
progressive assimilation of material from the overriding continent and 
from subducted sediments’’””°. 


Table 1 | Summary of SCR basalts and proposed mechanisms 


isosurface for V,,, = 3.0 cm yr 


Farallon slab where velocity vectors originate (see Fig. 2 for detailed flow fields). 
b, Bottom view showing distribution of asthenosphere upwellings beneath the 
mid-ocean ridge (long red shape) and propagating slab tear (short red shape). 
The region inside the red surface represents mantle upwelling with radial 
velocity magnitudes >3 cm yr *. 


These three features are consistent with our proposed bottom-up 
melting model as follows. The earliest eruption at Steens Mountain is 
derived from the depleted subducting oceanic lithosphere, whose 
melts also tend to be low in silica content. The upward melting across 
the subducting slab will eventually melt its mafic crust in an avalanche 


SCR formation* Geochemical property'19-?? 


Volume?4 (km?) 


Source composition Unreconciled aspects of previous models 


Steens (16.6-16.2 Myr) Low silica content; high eng, low 60,000 Subducting oceanic lithosphere Highly depleted magma source 
87Sr/®°Sr, incompatible element (plume-head models", lithosphere- 
depletion plume?? or slab-plume?® interactions) 

Imnaha (16.3-16.1 Myr) Excess 2°°208ph/24Pb, excess Th, Nb 10,000 = Subducting oceanic lithosphere 
and ?He/*He and sediments or mantle plume 

Grande Ronde (16.1-15.0 Myr) Silica saturated; low ena, high 150,000 Subducting oceanic crust and High SiOz and homogeneity (plume- 


87Sr/®©Sr, chemically homogeneous, 
incompatible element enrichment 


sediments and Archean mantle 
lithosphere 


head models? lithosphere-plume?? or 
slab-plume?® interactions) and large 
volume and high SiOz (back-arc 
processes®®) 
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fashion, which best explains Grande Ronde’s homogeneous composi- 
tion, saturation in silica content, and high-degree of incompatible 
elements enrichment”. With the mantle isotherm shifted towards 
shallower depths, a progressive involvement of sediments from the 
subducted sea floor and of the overriding lithosphere and crust is 
expected, which explains the transitional composition of Imnaha 
lavas'”°. Our proposed mechanism, however, does not preclude the 
existence of secondary crustal processes, such as crystal fractionation, 
before surface eruption. 

Of all SCR formations, elevated *He/*He ratios in a subpopulation 
of Imnaha basalts are the only ones that might indicate a mantle plume 
signature*'. However, involvement ofa plume head should lead to both 
highly heterogeneous and enriched compositions throughout SCR 
lavas, which is contrary to observations and suggests that if a mantle 
plume ever existed, it played a very minor part in defining the SCR 
flood basalt. The remarkable spatial/temporal correlations shown in 
Fig. 1 and the consistency with geochemistry (Table 1) strongly favour a 
slab-rupture model over those with a plume-head origin. Furthermore, 
Palaeocene-Eocene-aged coastal ranges are interpreted as an accreted 
island chain formed by the Yellowstone hotspot**** and provide 
possible evidence for a long-lived Yellowstone plume conduit that 
pre-dates the SCR event. We speculate that the Farallon slab captured 
the plume conduit and caused a short hiatus of Yellowstone-related 
volcanism during the early Miocene epoch, consistent with the plume- 
affinity of Imnaha basalts resulting from a northeast-deflected plume 
conduit”. Subsequently, accompanying the slab rupture, the plume 
conduit was reestablished to its original position along the ESRP*’, 
where a slab gap has existed throughout the upper mantle till the 
present day’®, and provided a path for the plume conduit to reach 
the surface (Supplementary Fig. 2). However, the interactions of this 
putative plume conduit with the Farallon slabs remain to be clarified. 


METHODS SUMMARY 


To explain the complex present-day mantle structure beneath the western USA, 
we use forward geodynamic models to simulate Farallon subduction during the 
past 40 Myr (ref. 10). The models are consistent with several types of palaeo- 
records, where both the initial and boundary conditions are observationally con- 
strained. The thermal structure of the oceanic plate is according to palaeo-sea floor 
ages. With a high numerical resolution of down to 7 km, the models can effectively 
represent sophisticated thermal and rheological structures along plate boundaries, 
including mid-ocean ridges, transform faults, and subduction zones. The viscosity 
profiles of these fine features, especially those along convergent plate boundaries, 
are constrained such that the subducting slab hinge closely follows the time history 
of observed trench locations. We also introduce a pseudo free-surface function 
such that subduction occurs more naturally than in models without this function. 
The Mid-Miocene slab tear event as described in this paper was critical for the 
formation of the present-day mantle structure. We find that although the exact 
shape and position of present mantle structure are very sensitive to the magnitude 
of mantle and slab viscosities’®, the slab tear starting at around 17 Myr ago was not: 
the tear always occurred with similar timing and location even when mantle 
viscosities varied by one order of magnitude around the best-fit values'®. This slab 
tear, however, no longer occurred when we excluded the pseudo free-surface func- 
tion in the calculation, indicating that the near-trench sub-slab dynamic pressure 
controls formation of the slab tear (Supplementary Fig. 3). Of all the models, the one 
that best predicts the seismic image also seems to match the SCR flood basalt 
stratigraphy best, which provides another validation for the geodynamic model. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Model set-up. We used the three-dimensional spherical finite element code for 
mantle convection, CitcomS’’, to simulate Farallon subduction. The code solves 
for an incompressible Newtonian fluid within a regional spherical mantle shell. 
We adopted a mesh with 257 X 257 X 65 nodes in latitude X longitude < depth, 
covering a physical domain of 60° X 100° X 2,760 km, respectively. The mesh had 
variable grid spacing in all three dimensions such that resolution increased 
towards the centre and the surface, with the finest mesh grid having a block size 
of 12 km X 20 km X 7 km beneath the western USA. We ensured that the model 
box was wide enough to avoid sidewall-induced artificial flows. In our case, the box 
had the nearest vertical boundary over 2,000 km away from any part of the western 
USA. We also tested the effect of box depth on mantle flow and found that as long 
as the depth is larger than 1,000 km, the resulting upper mantle structures become 
similar. So we chose a mantle depth of approximately 2,700 km. 

Initial and boundary conditions. These are essential components of a data 
assimilation model. The initial condition is both observationally constrained 
and numerically tested. The starting time of 40 Myr is consistent with both 
Cascada volcanism history” and present-day seismic tomography””. Tests suggest 
that subduction starting from this initial time are more than enough to properly 
capture the Farallon evolution since the Miocene, which is the time of interest in 
this study. The boundary conditions including both the thermal structure and 
surface kinematics are based on a recent plate reconstruction*'. We define the top 
thermal boundary layer in oceanic plates using time-dependent palaeo-seafloor 
ages, with the profile largely following a half-space cooling model. The plate 
motions are imposed as top velocity boundary conditions at every time step during 
the calculation. 

Another important feature adopted in our calculation is a pseudo-free surface 
implemented as a ‘sticky air’ layer on top of a viscous fluid, which has been shown 
to be successful in obtaining realistic slab geometry in the vicinity of a subduction 
zone as found in laboratory experiments*’. We converted this sticky-air function 
into a phase transformation within the uppermost two elements of the model such 
that oceanic plate within this layer gains some extra negative buoyancy, to mimic 
the lateral pressure gradient at a convergent boundary due to the low topography 
of the trench. This effectively mimics the zero-density ‘sticky air’ implementation 
because they both increase the lateral gradient of buoyancy that promotes the 
plate’s tendency to sink asymmetrically at the subduction front. 

Rheologic structures. We use both depth- and temperature-dependent viscosity. 
A four-layer viscosity structure is assumed, including lithosphere, asthenosphere, 
transition zone, and lower mantle. The viscosity magnitudes of these layers are 
subsequently constrained by predicting the present-day seismic tomography 
image’’. Density changes due to the phase transformation across the 410-km 
and 660-km interfaces were also considered. Lateral variation of viscosity is 
achieved first from temperature dependence, assuming a Newtonian fluid. We 
did not use a nonlinear rheology because of existing controversies over the viscous 
strength of mantle and slabs. We circumvent this problem by searching for the 
effective viscosity of ambient mantle and slabs so that we can best predict the 
observed slab morphology at the present day across the entire upper mantle. 
Furthermore, the model also incorporated many sharp rheological features such 
as narrow plate boundaries (vertical at mid-ocean ridges and transform faults, and 
one-side dipping above down-going slabs), slab hinges (weak bending parts of the 
slab with a stronger core), a mantle wedge (a weak zone between the surface and 
the slab), and large rheological variations from the Basin and Range province to 


the cratonic North America. In effect, the model can achieve a viscosity contrast of 
up to four orders of magnitude within a 100-km distance. We find that the 
resulting strong three-dimensional viscosity variation is essential for the genera- 
tion of asymmetric subduction and a physically reasonable subducting slab whose 
hinge closely follows the position of trenches as a function of time. 
Mid-Miocene slab tear. With the model set-up as described above, we find that a 
segmented present-day mantle structure can easily be reached’®. However, to 
predict the exact structures as observed by tomography*’, we have to find the 
appropriate viscosities for both the ambient mantle and slabs. Interestingly, we 
find that, besides the best-fitting model, those runs whose viscosities are several 
times larger or smaller than the best-fitting ones still have the Farallon slab break at 
a similar time and location, even though their predicted present-day structures are 
entirely off. This suggests that the slab tear was not sensitive to the large-scale 
viscosity structure. The detailed viscosity profile of the subduction zone and 
mantle wedge, on the other hand, plays a major part in the behaviour of the 
subducting slab, including its smoothness, curvature and trench retreat rate. For 
the slab hinge to follow trench position through time, a critical constraint from 
observations, the subduction zone has to be weak. However, we did not find 
evidence that this fine-scale rheology structure controls the slab tear formation. 

Other parameters that may control the slab tear include surface plate motions 
and sub-slab dynamic pressure. We found that the imposed surface kinematics 
cannot generate the slab tear without enough sub-slab dynamic pressure (remov- 
ing the sticky-air function decreases the dynamic pressure under the slab): when 
plate motions slowed down and trench rollback increased during the Miocene as 
observed, the slab would simply roll back faster but still follow the trench position 
without breaking internally around the mid-Miocene (Supplementary Fig. 3). The 
sub-slab dynamic pressure, however, is more important. Because the dynamic 
pressure is an intrinsic property of the three-dimensional slab geometry, it cannot 
be entirely separated out for a sensitivity test in a multi-parameter model like the 
one we have. But we can test its effect by turning on and off the sticky air function 
whose existence increases the sub-slab dynamic pressure by acknowledging the 
effect of trench topography. We found that without the sticky air, the slab would 
never break around the mid-Miocene even with the slow-down of surface plates, 
and the predicted present-day mantle structures would never match the tomo- 
graphy image. With the sticky air, the excess sub-slab dynamic pressure associated 
with the shrinking width of the Farallon plate builds up with time, and eventually 
breaks the slab midway along the trench where the pressure is the greatest (Fig. 3, 
Supplementary Fig. 3). 
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The mapping of expression quantitative trait loci (eEQTLs) has 
emerged as an important tool for linking genetic variation to 
changes in gene regulation’®. However, it remains difficult to 
identify the causal variants underlying eQTLs, and little is known 
about the regulatory mechanisms by which they act. Here we show 
that genetic variants that modify chromatin accessibility and tran- 
scription factor binding are a major mechanism through which 
genetic variation leads to gene expression differences among 
humans. We used DNaseI sequencing to measure chromatin 
accessibility in 70 Yoruba lymphoblastoid cell lines, for which 
genome-wide genotypes and estimates of gene expression levels 
are also available**. We obtained a total of 2.7 billion uniquely 
mapped DNase I-sequencing (DNase-seq) reads, which allowed 
us to produce genome-wide maps of chromatin accessibility for each 
individual. We identified 8,902 locations at which the DNase-seq 
read depth correlated significantly with genotype at a nearby single 
nucleotide polymorphism or insertion/deletion (false discovery 
rate = 10%). We call such variants ‘DNase I sensitivity quantitative 
trait loc? (dsQTLs). We found that dsQTLs are strongly enriched 
within inferred transcription factor binding sites and are frequently 
associated with allele-specific changes in transcription factor bind- 
ing. A substantial fraction (16%) of dsQTLs are also associated with 
variation in the expression levels of nearby genes (that is, these loci 
are also classified as eQTLs). Conversely, we estimate that as many 
as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. 
Our observations indicate that dsQTLs are highly abundant in the 
human genome and are likely to be important contributors to 
phenotypic variation. 

It is now well established that eQTLs are abundant in a wide range of 
cell types and in diverse organisms, and recent studies have implicated 
human eQTLs as being important contributors to phenotypic vari- 
ation’. However, the underlying regulatory mechanisms by which 
eQTLs affect gene expression remain poorly understood. One mech- 
anism that may be important is when the alternative alleles at a par- 
ticular single nucleotide polymorphism (SNP) lead to different levels 
of transcription factor binding or nucleosome occupancy at regulatory 
sites; this in turn may lead to allele-specific differences in transcription 
rates” ’*. In this study we used DNase-seq in a panel of 70 individuals 
and found that a large fraction of eQTLs are indeed probably caused by 
this type of mechanism. 

DNase-seq is a genome-wide extension of the classical DNase I 
footprinting method'*""*. This assay identifies regions of chromatin 
that are accessible (or ‘sensitive’) to cleavage by the DNase I enzyme. 
Such regions are referred to as DNase I-hypersensitive sites (DHSs). 
DNaseI sensitivity provides a precise, quantitative marker of regions 
of open chromatin and is well correlated with a variety of other 
markers of active regulatory regions including promoter-associated 


and enhancer-associated histone marks. Furthermore, bound tran- 
scription factors protect the DNA sequence within a binding site from 
DNaseI cleavage, often producing recognizable ‘footprints’ of 
decreased DNase! sensitivity'*'*""”. 

Wecollected DNase-seq data for 70 HapMap Yoruba lymphoblastoid 
cell lines for which gene expression data and genome-wide genotypes 
were already available**. We obtained an average of 39 million uniquely 
mapped DNase-seq reads per sample, providing individual maps of 
chromatin accessibility for each cell line (see Supplementary Informa- 
tion for all analysis details). Our data allowed us to characterize the 
distribution of DNaseI cuts within individual hypersensitive sites at 
extremely high resolution. As expected, the DHSs coincided to a great 
extent with previously annotated regulatory regions, and DNaseI 
sensitivity was positively correlated with the expression levels of nearby 
genes (Supplementary Figs 6 and 7). Overall, the locations of hyper- 
sensitive sites were highly correlated across individuals (Supplementary 
Information)". 

We tested for genetic variants that affect local chromatin accessibility. 
To do this, we divided the genome into non-overlapping 100-base-pair 
(bp) windows, and then focused our analysis on the 5% of windows with 
the highest DNaseI sensitivity (see Supplementary Information). For 
each individual we treated the number of DNase-seq reads in a given 
window, divided by the total number of mapped reads, as a quantitative 
trait that estimated the level of chromatin accessibility. We then tested 
for association between individual-specific DNase! sensitivity in each 
window and genotypes of all SNPs and insertions/deletions (indels) ina 
cis-candidate region of 40 kilobases (kb) centred on the target window. 

Using this procedure, we identified associations between genotypes 
and inter-individual variation in DNase-seq read depth in 9,595 
windows at a false discovery rate (FDR) of 10% (corresponding to 
8,902 distinct DHSs, once we combined adjacent windows whose 
hypersensitivity data were associated with the same SNP or indel; 
Fig. la). We refer to these 8,902 loci as ‘DNase I sensitivity QTLs’, or 
dsQTLs, and show an example in Fig. 1c-f. We additionally considered 
a much smaller cis-candidate region of only 2 kb around each target 
window and found that most of the dsQTLs were detected within this 
smaller region (7,088 associated windows in 6,070 DHSs), suggesting 
that most dsQTLs lie close to the target DHS. In contrast, we found 
only weak evidence of trans-acting dsQTLs, probably because our 
experiment was underpowered for detecting these (Supplementary 
Information). For dsQTLs with enough DNase-seq reads overlapping 
the most significant SNP (n = 892), we confirmed that the fraction of 
reads carrying each allele in heterozygotes was well correlated with the 
dsQTL effect sizes (correlation coefficient r=0.72, P< 10 7°; 
Fig. 1b). 

We observed that dsQTLs typically affected chromatin accessibility 
for about 200-300 bp (Fig. 2a). Of the DHSs affected by dsQTLs, 77% 
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Figure 1 | Genome-wide identification of dsQTLs and a typical example. dsQTL (rs4953223). The black line indicates the position of the associated SNP. 


a, Q-Q plots for all tests of association between DNase I cut rates in 100-bp 
windows, and variants within 2-kb (green) and 40-kb (black) regions centred 
on the target DHS windows. b, Allele-specific analysis of dsQTLs in 
heterozygotes. Plotted are the predicted (x axis) and observed (y axis) fractions 
of reads carrying the major allele based on the genotype means. c, Example of a 


lie in chromatin regions previously predicted’*® to be functional in 
lymphoblastoid cell lines: 41% in predicted enhancers, 26% in promoters, 
and 10% in insulators, even though those chromatin states together cover 
only 6.7% of the genome overall (and 38% of our hypersensitive sites). 

We next studied the properties of cis-acting variants that generated 
dsQTLs, with the use of a Bayesian hierarchical model that accounted 
for the uncertainty about which sites are causal’? (Supplementary 
Information). This model obtained unbiased estimates of the average 
properties of causal sites even though, because of linkage disequilibrium, 
it was typically uncertain which site was causal for any individual dsQTL 
(Supplementary Information). As shown in Fig. 2b,c, most dsQTLs 
were generated by variants close to the target window. We estimate that 
56% of the dsQTLs were due to variants that lay within the same DHSs 
and that 67% lay within 1 kb of the target window. dsQTLs that lay more 
than 1kb from the target window were themselves significantly 
enriched in non-adjacent DHS windows (2.4-fold compared with 
matched random SNPs) and were often associated with changes in 
sensitivity in multiple non-adjacent DHS windows (Supplementary 
Fig. 15). 

One intuitive mechanism for dsQTLs is that these may be caused by 
variants that strengthen or weaken individual transcription factor 
binding sites, thereby changing transcription factor affinity and local 
nucleosome occupancy*’** and hence DNase! cut rates. Consistent 
with this model, an aggregated plot of DNase sensitivity at dsQTLs 
showed a distinct drop in chromatin accessibility around putatively 


d, Box plot showing that rs4953223 is strongly associated with local chromatin 
accessibility (P = 3 x 10-1). e, The T allele, which is associated with low 
DNasel sensitivity, disrupts the binding motif of a previously identified NF- 
«B-binding site at this location’*. f, NF-kB ChIP-seq data from ten individuals’ 
indicates a strong effect of this SNP on NF-«B binding. 


causal SNPs that was reminiscent of transcription factor binding foot- 
prints, especially in the genotypes associated with high sensitivity'’*”. 

To test the importance of disruption of transcription factor binding 
sites as a mechanism underlying dsQTLs, we again turned to the 
Bayesian hierarchical model. We used the union of all published foot- 
print locations in lymphoblastoid cell lines'®*”” and a set of footprints 
that we identified from the DNase-seq data reported in this study 
(Supplementary Methods). Analysis using the hierarchical model indi- 
cated a 3.6-fold enrichment of dsQTLs within transcription factor 
binding footprints (P< 10 **°), controlling for the overall enrichment 
within DHSs. In addition, the allele associated with a higher score of 
the position weight matrix is typically associated with higher 
chromatin accessibility (P< 10 1°), which is consistent with the 
expectation that higher transcription factor binding affinity leads to 
more open chromatin (Fig. 2d). Of the dsQTLs that fell within DNase- 
seq footprints tied to specific transcription factor motifs (using 
CENTIPEDE”), CCCTC binding factor (CTCF), cAMP-response ele- 
ment (CRE) and interferon-stimulated response element (ISRE) were 
the most enriched, whereas MADS box transcription enhancer factor 2 
(MEF2) was significantly depleted. 

To further understand the functional consequences of dsQTLs, we 
examined ChIP-seq data for nine transcription factors collected by the 
ENCODE Project in one or more lymphoblastoid cell lines’®”. 
Overall, the alleles that were associated with increased DNaseI 
sensitivity were highly associated with increased transcription factor 
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high-confidence dsQTLs that lie within the target DHS. Individuals were 
separated into the high-sensitivity (blue), heterozygote (green), and low- 
sensitivity (red) classes. The shading indicates the bootstrap 95% confidence 
intervals. b, The peak density of dsQTLs is very tightly focused around the 
target DHS window. ¢, Total fraction of cis-dsQTLs that fall into different 
categories of distance from the target window (x axis) and different annotations 
(y axis). The total area of each rectangle is proportional to the estimated number 
of dsQTLs in that category. d, Box plot showing distribution of position weight 
matrix (PWM) score differences between high-sensitivity and low-sensitivity 
dsQTL alleles, respectively. Notches indicate 95% confidence intervals for 


binding (P< 10 '°; Fig. 2e), indicating that dsQTLs are strong pre- 
dictors of changes in occupancy by a range of DNA-binding proteins. 

Given that dsQTLs produce sequence-specific changes in chro- 
matin accessibility and, frequently, changes in transcription factor 
binding, we speculated that a fraction of the dsQTL variants might 
also affect expression levels of nearby genes. We examined this by 
testing for associations between the most significant variant at each 
of the dsQTLs detected by using the 2 kb window size and expression 
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major allele based on the DNase I genotype means; the y axis shows the 
observed fraction in ChIP-seq data. The lines show the regression fits for each 
factor separately; the numbers in the key show the fraction of sites that are ina 
concordant direction for each factor. CTCF, CCCTC binding factor; BATF, 
basic leucine zipper transcription factor; BCL11A, B-cell CLL/lymphoma 11A 
zinc-finger protein; EBF, early B-cell factor 1; IRF4, interferon regulatory factor 
4; POU2F2, POU class 2 homeobox 2; PU1, proviral integration oncogene spil; 
SP1, Sp] transcription factor; NF-«B, nuclear factor of « light polypeptide gene 
enhancer in B-cells 1. 


levels of nearby genes (that is, genes with transcription start sites 
(TSSs) within 100 kb) estimated by sequencing RNA from the same 
cell lines®. Using this approach, we found that 16% of dsQTL SNPs 
were also significantly associated with variation in expression levels of 
at least one nearby gene (FDR = 10%). This represents a huge enrich- 
ment over random expectation (450-fold, P< 10~'*; Fig. 3). One 
example of a joint ds;QTL-eQTL is illustrated in Fig. 3a, in which a 
SNP disrupts an ISRE located in the first intron of the SLFN5 gene, 
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Figure 3 | Relationship between dsQTLs and eQTLs. a, Example of a dsQTL 
SNP that is also an eQTL for the gene SLFN5. The SNP disrupts an interferon- 
sensitive response element, thereby changing local chromatin accessibility 
within the first intron of SLEN5. Expression of SLENS has been shown to be 
inducible by interferon « in melanoma cell lines. DNase-seq (left) and RNA-seq 


leading to both a strong dsQTL and an eQTL for SLFN5. Conversely, 
out of 1,271 eQTLs detected by using RNA-seq data from these cell 
lines*, 23% of the most significant SNPs were also dsQTLs 
(FDR = 10%). Using the method in ref. 24 for estimating the propor- 
tion of tests in which the null hypothesis is false (while accounting for 
incomplete power), we estimate that 55% of the most significant eQTL 
SNPs are also dsQTLs and that 39% of the dsQTLs are also eQTLs. 
dsQTLs are therefore a major mechanism by which genetic variation 
may affect gene expression levels. 

We observed that for most (70%) of the joint dsQTL-eQTLs, the 
allele that was associated with increased chromatin accessibility was 
also associated with increased gene expression levels (Fig. 3b). Because 
higher DNase I sensitivity generally correlates with higher transcrip- 
tion factor occupancy, this suggests that transcription factors that are 
bound to DHSs usually act as enhancers. CRE-box and ETS-box were 
the most enriched motifs among repressors and enhancers, respec- 
tively. The dsQTLs that were also eQTLs (FDR = 10%) were highly 
enriched around the TSSs of the target genes: for 23% of the joint 
dsQTL-eQTLs, the associated DHS was within 1 kb of the TSS, and 
for 39% it was within 10 kb (Fig. 4a). This is consistent with previous 
work showing strong clustering of eQTLs around TSSs!??>”*, 
Nonetheless, there was a significant signal of long-range regulation 
as far as 100kb. In addition, 14% of the joint dsQTL-eQTLs were 
significant eQTLs for two or more genes, suggesting that some regu- 
latory regions affect more than one gene. 

We sought to identify additional factors that might influence 
whether a dsQTL regulates gene expression of nearby genes, while 
controlling for the very strong effect of distance from TSS (Fig. 4b). 
We observed that a dsQTL was more likely to be an eQTL for the gene 
with the nearest TSS (1.6-fold, P= 3 X 10 *) and was more likely to 
be an eQTL if it was located within the transcribed region of the gene 
(2.7-fold, P= 2 X 10°). Further, a dsQTL was 2.6-fold more likely to 
be an eQTL if it was associated with a DHS that overlapped a DNA 
methylation QTL” (P=4 x 10“), and showed a 2.4-fold increase if 
the associated DHS overlapped a RNA polymerase II ChIP-seq peak” 
(P=4 X10 *). Conversely, a dsQTL was significantly less likely to be 
an eQTL for a gene if an active binding site for the insulator protein 
CTCF" lay between the dsQTL and the gene’s TSS (2.4-fold decrease, 
P=10 "”). Finally, the presence of the enhancer mark P300 (from 
ENCODE ChIP-seq data**) in the dsQTL window increased the 
probability that a distal dsQTL (TSS > 1.5 kb) was an eQTL (1.7-fold, 
P=10~). 
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(right) measurements from DNase-seq and RNA-seq are plotted, stratified by 
genotype at the putative causal SNP. b, Q—Q plot of the t-statistic for association 
with gene expression changes (eQTL) of dsQTL SNPs. The sign of the eQTL 

t-statistic is with respect to the genotype that increases DNase sensitivity. 


We have shown here that common genetic variants affect chromatin 
accessibility at thousands of hypersensitive regions across the human 
genome. The putative causal variants most often lie within or very 
near the hypersensitive regions, and frequently act by changing the 
binding affinity of transcription factors. Mapping of dsQTLs provides 
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Figure 4 | Relationship between dsQTLs and eQTLs. a, Most joint dsQTL- 
eQTLs lie close to the gene TSS. b, Effect of various factors on the log odds that a 
given dsQTL is also an eQTL, while controlling for the strong distance 
relationship observed in a. In annotations (1) and (2) we do not consider the 
direction of transcription. In annotations (6-8) ChIP-seq is measured on the 
dsQTL window. In annotations (4) and (6), ‘meQTL’ refers to a dsQTL that is 
also associated with methylation levels of a nearby CpG site” and ‘Pol II refers 
to the presence of an RNA polymerase II ChIP-seq peak overlapping the DHS 
associated with the dsQTL”’. One of the most significant annotations in 
delineating the regulatory regions is defined by the presence of the CTCF 
insulator element, which decreases 2.4-fold the probability that a dsQTL is an 
eQTL. Error bars represent 95% confidence intervals. 
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a powerful tool for detecting potentially functional changes in a variety 
of different types of regulatory element, and roughly 50% of eQTLs are 
also dsQTLs. Furthermore, analysis of significantly associated SNPs 
from genome-wide association studies additionally implicates some of 
these dsQTLs as potentially underlying a variety of genome-wide 
association study hits (Supplementary Information). Changes in 
chromatin accessibility may be a major mechanism linking genetic 
variation to changes in gene regulation and, ultimately, organismal 
phenotypes. 


METHODS SUMMARY 


DNase-seq libraries were created as described previously”, with small modifica- 
tions. Each library was sequenced on at least two lanes of an Illumina GAIIx. 
Resulting 20-bp sequencing reads were mapped to the human genome sequence 
(hg18) using an algorithm that we designed specifically to eliminate mappability 
biases between sequence variants. We divided the genome into 100-bp windows 
and selected the top 5% in terms of total DNase I sensitivity. DNase I sensitivity for 
each individual in each window was normalized by the total number of mapped 
reads for that individual. For QTL mapping, the data were further rescaled within 
and across individuals, and we adjusted the data for an observed individual X GC 
interaction, as well as for the top four principal components of the DNaseI 
sensitivity matrix. Genotypes for all available SNPs and indels were obtained from 
HapMap and 1,000 Genomes data and imputed where necessary*”*°. We per- 
formed DNase-seq association mapping by regressing the adjusted sensitivity in 
each window against the genotypes at variants in a 40-kb region centred on each 
DHS. As validation, we used our DNase-seq reads as well as ChIP-seq reads and 
DNase-seq reads from ENCODE to confirm that allele-specific reads spanning 
heterozygous sites at dsQTLs were consistent with the association analysis. We 
also used RNA-seq data from the same cell lines* to study the links between 
dsQTLs and eQTLs. Finally, we explored the properties of dsQTLs that made 
them more or less likely to influence gene expression by fitting a logistic model 
onall dsQTLs, where the eQTL status of each dsQTL-eQTL test was modelled as a 
function of distance from the TSS and a variety of other annotations. For full 
details of all methods see Supplementary Information. 
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Extrathymically generated regulatory T cells control 
mucosal T;;2 inflammation 
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A balance between pro- and anti-inflammatory mechanisms at 
mucosal interfaces, which are sites of constitutive exposure to 
microbes and non-microbial foreign substances, allows for efficient 
protection against pathogens yet prevents adverse inflammatory 
responses associated with allergy, asthma and intestinal inflam- 
mation’. Regulatory T (T,.g) cells prevent systemic and tissue- 
specific autoimmunity and inflammatory lesions at mucosal inter- 
faces. These cells are generated in the thymus (tT;eg cells) and in 
the periphery (induced (i)T,eg cells), and their dual origin implies 
a division of labour between tT,-g and iT,eg cells in immune 
homeostasis. Here we show that a highly selective blockage in dif- 
ferentiation of iT ,.¢ cells in mice did not lead to unprovoked multi- 
organ autoimmunity, exacerbation of induced tissue-specific 
autoimmune pathology, or increased pro-inflammatory responses 
of T helper 1 (T}1) and Ty;17 cells. However, mice deficient in iT ,¢g 
cells spontaneously developed pronounced T}2-type pathologies 
at mucosal sites—in the gastrointestinal tract and lungs—with 
hallmarks of allergic inflammation and asthma. Furthermore, 
iT,eg-cell deficiency altered gut microbial communities. These 
results suggest that whereas T,.. cells generated in the thymus appear 
sufficient for control of systemic and tissue-specific autoimmunity, 
extrathymic differentiation of T,., cells affects commensal micro- 
biota composition and serves a distinct, essential function in 
restraint of allergic-type inflammation at mucosal interfaces. 

Exquisitely balanced control mechanisms operating at mucosal sites 
are able to accommodate potent immune defences and the need to 
prevent tissue damage resulting from inflammatory responses caused 
by commensal microorganisms, food and environmental antigens, 
allergens, and noxious substances’. 

Prominent among multiple regulatory lymphoid and myeloid cell 
subsets operating at environmental interfaces are Foxp3~ Treg cells. 
Genetic deficiency in Foxp3 (forkhead box P3, a key transcription 
factor specifying T,eg cell differentiation) leads to paucity of Foxp3* 
Treg cells and consequent generalized lympho- and myelo-proliferative 
syndrome, featuring sharply augmented serum IgE levels, production 
of Ty1, Tq2 and Ty17 cytokines, and widespread tissue inflam- 
mation’. Foxp3 can be induced in thymocytes in response to T-cell 
receptor (TCR) and CD28 stimulation, and IL-2. In addition, Foxp3 
can be upregulated upon TCR stimulation of mature peripheral CD4* 
T cells in the presence of tumour growth factor 8 (TGF) in a manner 
dependent on an intronic Foxp3 enhancer CNS1 (refs 3-5). Inflam- 
matory cytokines and potent co-stimulatory signals antagonize the 
peripheral induction of Foxp3, and retinoic acid augments Foxp3 
induction through mitigating inflammatory cytokine production 
and through cell intrinsic mechanisms’**. Although differing in their 
sites of generation, tT eg and iT eg cells are comingled in the secondary 
lymphoid organs and non-lymphoid tissues once mature, and their 
relative contributions to the total population of T,eg cells and their 


specific roles in control of various aspects of immune homeostasis and 
microbial colonization in normal animals has remained unexplored. 
Our recent investigation® showed that CNS1, which contains bind- 
ing sites for transcription factors (NFAT, Smad3 and RAR/RXR) 
downstream of three signalling pathways implicated in iTyeg cell 
generation** (Supplementary Fig. 1), is critical for TGFB-dependent 
induction of Foxp3, but has no apparent role in tT eg differentiation or 
maintenance of Foxp3 expression. This observation suggested that 
CNS1 activity represents a dedicated genetic determinant for the dif- 
ferentiation of iT,<g cells, and its deficiency in mice provides a unique 
means to evaluate the function of these cells in vivo. Our initial char- 
acterization of CNS1” mice and littermates maintained on a 129/B6 
genetic background failed to reveal disease phenotypes. Because mixed 
genetic backgrounds frequently mask adverse phenotypes or make 
them highly variable, to understand iT,.,¢ function in vivo we back- 
crossed CNS1 mice onto the B6 background (Supplementary Fig. 2). 
First, we sought to ascertain that on the B6 genetic background 
CNS1 is dispensable for tT ,eg cell generation but critical for generation 
of iT cg cells. Two recent studies established a role for TGFB signalling 
in tT,g cell differentiation in neonates”'®. Thus, to exclude the 
possibility that CNS1 deficiency adversely affects generation of 
Foxp3" T cells in the neonatal thymus, we examined the Foxp3* Treg 
cell population in heterozygous female CNSI"’~ mice. As Foxp3 is 
encoded on the X chromosome and is subject to random X-chromosome 
inactivation, characterization of female CNSI"’~ mice allows for com- 
parison of CNS1” and CNS1“T Tye cells in a competitive environ- 
ment. In neonatal female CNSI"’~ mice, CNS1~ cells constituted, 
on average, one-half of the thymic Foxp3* cell population (Fig. 1a). 
Additionally, neonatal CNS1 hemizygous and control males harboured 
comparable numbers of Foxp3* thymocytes (Supplementary Fig. 3). 
Therefore, tT reg differentiation is independent of CNS1. In contrast, 
CNS1~ naive CD4 T cells showed severely impaired induction of 
Foxp3 in vitro (Fig. 1b). Analyses of heterozygous female CNS1”"’~ 
mice and transfer of CNS1~ or CNS17 Treg cells into lymphopenic 
recipients demonstrated that the ability of T,.g cells to accumulate and 
proliferate in various tissues was unperturbed in the absence of CNS1 
(Supplementary Fig. 4). Furthermore, CNS1 deficiency did not affect 
suppressor activity of tT yg cells (assessed using in vitro suppression 
assays and adoptive transfers of Foxp3-deficient effector T cells with 
predominantly tT,..-containing Foxp3" cells isolated from 4-week-old 
CNS1 and CNS1""" mice into lymphopenic recipients (Supplementary 
Fig. 5)). Likewise, CNS1 ablation did not negatively affect maintenance of 
Foxp3 expression and overall function of NFAT, TGF and retinoic acid 
signalling pathways in these cells (Supplementary Fig. 5 and data not 
shown). To assess how the deficiency in iT,., cell generation affects 
the size of the peripheral T,.g cell compartment, we analysed Tyeg cell 
frequencies in various tissues throughout the lifespan of mice. CNS1— 
mice failed to exhibit a progressive age-dependent increase in Foxp3* 
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Figure 1 | Impaired iT,-, cell generation and altered composition of the 
peripheral T,,., cell population in CNS1-deficient mice. a, Relative 
contribution of CNS1~ (GFP*) and CNS1 (GEP7 ) cells to the Foxp3* 
thymocyte subset in 4-day-old CNSI“’"’~ female mice. SP, single positive. 
b, Induction of Foxp3 in Foxp3” Ty (naive) cells FACS sorted from CNS1 ~ 
(knockout, KO) or Foxp3tP mice stimulated in vitro with TGF, IL-2 anti- 
CD3 and anti-CD28. c, Percentage of Foxp3” cells (of CD4*) in the spleen, 
lymph node (LN), mesenteric lymph nodes (MLN), Peyer’s patches (PP) and 
cells from the small and large intestine lamina propria (SI and LI) of 6-9 month 
old CNS1~ or control mice. d, Percentage of transferred (CD45.2*) CNS1_ or 
CNS1* CD25~CD44"°"CD45.2* OTII™ cells that induced Foxp3 following 
ape np of OVA in water for 6 days. e, Stability of Foxp3 expression in 
Tyeg Cells. FACS sorted GFP* or GFP cells from Foxp3°orP Cre-ERT? ice were 
aed with GFP” or GFP” cells, respectively, from CD45.1 Foxp3” mice 
into TCRB6-deficient recipients. Mice received tamoxifen (TMX) at 1 (left) or 5 
weeks (right) after transfer and stability of Foxp3 expression among YFP- 
labelled cells was assessed after 4 weeks. All data are representative of two or 
more independent experiments with n = 3. Error bars, s.d.; *P < 0.05, 
**P < 0.01, ***P < 0.001, as calculated by Students’ t-test. 


cell frequencies observed in wild-type littermates (Fig. 1c and Sup- 
plementary Fig. 6). By 6-8 months of age, CNS1” mice contained 
markedly fewer Foxp3* cells in comparison to control animals, with 
most prominent differences in mesenteric lymph nodes, Peyer’s 
patches, and small and large intestine lamina propria, sites known to 
support iT, eg cell generation’. This trend was not the result of expres- 
sion of a Foxp3-GFP fusion protein in CNS1™ mice, because age- 
matched CNS1“' Foxp3-GFP and littermate control CNS1“" mice 
expressing unmodified Foxp3 protein exhibited similar age-dependent 
increases in Tyeg cell frequencies (Supplementary Fig. 6). 

To assess the extent of impairment of peripheral generation of Tyeg 
cells in vivo, we examined Foxp3 induction in antigen-specific naive T 
cells upon exposure to ingested ‘non-self antigen’?. Ovalbumin 
(OVA)-specific OT- Il* TCR- transgenic Foxp3” (GFP) Treg cells 
from CNS1~ or Foxp3°'” mice were transferred into CD45.1* 
lymphoreplete recipients followed by ad libitum administration of 
OVA in drinking water. We failed to detect Foxp3 induction in 
CNS1-deficient cells, whereas up to 20% of transferred OT-II T cells 
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from control Foxp3"” mice induced Foxp3 upon exposure to cognate 


antigen in the intestinal tract (Fig. 1d and Supplementary Fig. 7). These 
results were in agreement with a marked impairment in Foxp3 induc- 
tion in polyclonal CNS1-deficient Foxp3” T cells in vitro, which was 
most severe at lower, more physiologically relevant concentrations of 
TGF (Fig. 1b). Together these data indicate that iT,eg cells have a 
stringent requirement for CNS1 for their differentiation. 

Recent studies showed a limited TCR-dependent clonal niche for 

tT, eg cell differentiation and peripheral maintenance'*". The sus- 
tained numerical impairment in the peripheral T,.¢ cell populations 
in CNS1-deficient mice suggests that tT, eg cells fail to fill the ‘void’ in 
the peripheral Tg cell pool, left by iT,eg cell deficiency. This obser- 
vation combined with largely non-overlapping TCR repertoires of 
tTreg and iTyeg cells suggests that iT,.g and tT, eg cells occupy distinct 
‘niches a test this notion we co-transferred CNS1 (tT ;eg cells) or 
CNS1” Treg ¢ cells (iT reg + tTreg) from aged mice with CNS1-sufficient 
naive CD45.1*Foxp3” CD4" T cells into lymphopenic recipients. We 
observed more efficient Foxp3 induction in CD45.1*CD4* T cells 
upon co-transfer with CNS1~ Treg cells (tT yeg cells), indicating that in 
lymphopenic recipients the de novo generation of iT, cells is markedly 
more efficient in the absence of pre-existing iT,., cells (Supplementary 
Fig. 8). These data also imply the existence ofa stable iT, eg cell subset in 
normal mice. However, the dynamics and stability of Foxp3 expression 
has been a controversial issue, with a number of studies favouring 
unstable Foxp3 expression in iT, eg cells'7"”. Thus, we next employed 
genetic fate mapping using inducible Cre recombinase expressed in a 
Treg-Specific manner (Kops ee) and a Rosa26-YFP recom- 
bination reporter allele (R26Y)” to determine if iTyeg cells generated in 
vivo are able to acquire stable Foxp3 expression and, thus, have the 
capacity to contribute to the stable le Treg cell compartment. 

Double-sorted naive CD45.2*Foxp3 YFP CD4 T cells from 
Foxp3hCFP REERT2 Ro6Y mice were transferred together with 
congenically marked CD45.1 Foxp3* Tyeg cells into lymphopenic 
recipient mice. Foxp3 expression within the population of tagged 
YFP” cells generated from YFP Foxp3” precursors was assessed four 
weeks after treatment of recipient mice with tamoxifen, which was 
administered early (one week) and late (five weeks) following cell 
transfer. Approximately half of the newly generated YFP-tagged 
iT,eg cells lost Foxp3 expression, whereas ‘mature’ iT, cells tagged 
at a later time point displayed remarkable stability (>90% Foxp3* 
cells among YFP* cells), comparable to that of transferred peripheral 
Treg cells (Fig. le and Supplementary Fig. 9). Together these data 
indicate that iT... cells have a stringent requirement for CNS1 for their 
differentiation, accumulate throughout life, and occupy a sizable frac- 
tion of the stable peripheral T,.¢ cell compartment. 

CNS1~ mice on the B6 genetic background displayed neither early- 
nor late-onset systemic autoimmunity nor spontaneous widespread 
tissue lesions nor severe morbidity associated with systemic Tyeg cell 
deprivation (data not shown). However, it was possible that iT eg cell 
deficiency may exacerbate initial or late stages of provoked tissue- 
specific autoimmune pathology directed against a self-antigen. To 
address this question, we induced experimental autoimmune encepha- 
lomyelitis (EAE) in CNS1-deficient or littermate control mice through 
immunization with myelin oligodendrocyte glycoprotein (MOG) 
peptide. The onset, severity and remission of disease were indistin- 
guishable, and no detectable differences were observed in Tyeg cell 
subsets in the brain in these two groups of mice (Supplementary Fig. 10). 
Although it will be important to evaluate the role of iT,., cells in 
additional models of induced autoimmunity, these results indicate that 
tTyeg cells are largely sufficient for control of tolerance to self-antigens 
and that the distinct functional role of iT,g cells might be to control 
inflammation at mucosal surfaces, which are sites of preponderant 
exposure to non-self substances. This notion is consistent with data 
indicating that tT,.g cells arise from a subset of thymocytes, which 
exhibit TCR with an increased affinity for self-antigens yet insufficient 
for negative selection'®”', whereas iT,., cells are efficiently generated 
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upon TCR engagement with a high affinity cognate ligand under 
subimmunogenic conditions”””. 

The absence of iT,¢g cell induction in response to oral antigen in 
CNS1” mice suggested that the immune balance in the gastrointestinal 
tract might be impaired owing to deficiency in gut antigen-specific 
iT eg cells. Indeed, while IL-17 and IFN-y production by CD4* T cells 
was unaffected by iT,eg deficiency in CNS1” mice (Supplementary 
Fig. 11), we observed markedly augmented production of the Ty42 
cytokines, IL-4, IL-5 and IL-13, by CD4* T cells, especially in the 
mesenteric lymph nodes, Peyer’s patches and intestinal lamina propria 
(Fig. 2a and Supplementary Fig. 12). Furthermore, the vast majority of 
CD4* T cells in the lamina propria of CNS1~ mice expressed high 
amounts of Gata3, a key Ty2 differentiation factor. Increases in 
Gata3"CD4* T cells were observed not only in gastrointestinal tract 
tissues in CNS1~ mice but also in other lymphoid tissues, albeit to a 
lesser extent (Fig. 2b and Supplementary Fig. 12). Consistent with the 
sharply augmented Ty;2 responses at mucosal sites, CNS1” mice 
exhibited increased frequencies of germinal centre B_ cells 
(Fas GL7") in the Peyer’s patches, but not in the spleen or peripheral 
lymph nodes (Supplementary Fig. 13), and spontaneous increases in 
serum levels of IgE and IgA, but not in other Ig isotypes (Fig. 2c, and 
data not shown). 

The dysregulated T},2 responses were associated with a decreased 
body weight (Fig. 3a and Supplementary Fig. 2) and distinct highly 
penetrant pathology throughout the gastrointestinal tract (Fig. 3b and 
Supplementary Fig. 14): all CNS1~ mice (12/12) and no cNs1T 
control littermates (0/6) were affected by gastritis and plasmacytic 
enteritis characterized by increased frequencies of plasma cells in the 
intestinal lamina propria and other associated lesions such as crypt 
abscesses. Accordingly, serum antibodies in CNS1” mice exhibited 
reactivity against antigens of the small and large intestine, pancreas 
and chow (Supplementary Fig. 13). Notably, the pathology observed in 
the gastrointestinal tissue of CNS1~ mice was markedly diminished 
upon B-cell depletion, but was not ameliorated by administration of 
IL-4 neutralizing antibody (Supplementary Fig. 15). The inflammatory 
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Figure 2 | Paucity of iT, eg cells results in T,2 inflammation in the 
gastrointestinal tract. a, Percentage of CD4* cells producing IL-4 (top), IL-13 
(middle) and IL-5 (bottom) in 3-month-old mice. Left, spleen, peripheral 
lymph nodes (LN) and mesenteric lymph nodes (MLN); right, lamina propria 
of small and large intestine (SI and LI, respectively). b, Percentage of Foxp3— 
CD4° cells that were Gata3* in 3-month old mice (PP, Peyer’s patches). 

c, Concentration of IgE and IgA in serum, determined by enzyme linked 
immunosorbent assay (ELISA) at 1, 3 and 10 months. All data are 
representative of three or more independent experiments with =3 mice per 
group. Error bars, s.d.; *P < 0.05, **P < 0.01, ***P < 0.001, as calculated by 
Students’ f-test. 
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features and lesions observed in CNS1 mice were consistent with 
allergic T};2-type intestinal disease (Fig. 3). 

One possible explanation for the pronounced Ty,2 responses and 
intestinal pathology associated with iT,eg cell deficiency is simply a 
numerical decrease in T,., cells. However, we consider this possibility 
unlikely, because graded depletion of Foxp3~ Treg cells in Foxp3?! 
mice upon administration of titrated amounts of diphtheria toxin 
resulting in T,., frequencies similar to those observed in CNS1 mice 
revealed augmented Ty1 and Ty17, but not Ty2, responses”. 
Alternatively, certain qualitative features of iT, 2. cells could allow them 
to efficiently limit T}2 inflammation in the gut. Recent studies sug- 
gested that some of the transcriptional regulators involved in a par- 
ticular type of effector T-cell response facilitate the ability of Teg cells 
to suppress those responses” *’”. Thus, we explored the expression of 
Ty2-associated transcription factor Gata3 in T;eg cells in CNS1 and 
CNS1™ mice. In contrast to a sharp increase in Gata3 expression in 
effector T cells (Fig. 2b and Supplementary Fig. 12), we found its 
expression markedly diminished in T,., cells in CNS1 mice (Fig. 3c 
and Supplementary Fig. 12). Notably, ablation of a conditional Gata3 
allele in Tyeg cells leads to Tyeg cell dysfunction’? and marked 
augmentation of Ty2 cytokine production by CD4* T cells (D. 
Rudra, R.E.N. and A.Y.R., manuscript in preparation). We hypothesized 
that increased Gata3 expression in iT, eg cells reflects their activation state 
upon TCR ligation by high affinity ligands in the gut rather than an 
intrinsic feature of iT,eg cells. In support of this idea, we found that 
both CNS1 and control T;<g cells stimulated in vitro through the TCR 
and IL-2 receptor exhibited similarly robust Gata3 induction (Sup- 
plementary Fig. 12). Thus, we suggest that increased Gata3 expression 
in iT. cells, a likely consequence of their generation in response 
to high affinity TCR ligands present in the gut, endows these cells 
with the capacity to efficiently control spontaneous mucosal T42 
inflammation. 

Certain commensal bacteria increase the frequencies of Tyeg cells in 
the gut and provide antigens recognized by a considerable proportion of 
iTyeg TCR'®. In addition to TCR ligands the gut microbial community 
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Figure 3 | iT ¢g cell deficiency leads to Tj;2 type gastrointestinal pathology 


and altered microbial communities. a, Body weights of 9-12 (left) or 2.5- 
month-old individually housed (right) CNS1 (KO) and littermate control 
(WT) mice (n = 12). b, Plasmacytic enteritis (arrowhead) in CNS1-deficient 
mice revealed by haematoxylin and eosin staining of small intestine from 9-12- 
month-old CNS1_ (bottom and right) and littermate control mice (top). An 
early crypt abscess is indicated (asterisk). Data are representative of =20 mice 
analysed. c, Percentage of Foxp3* CD4* cells expressing Gata3* in 3-month- 
old mice. d, Percentage of total 16S rRNA gene sequences of the Firmicutes and 
Bacteroidetes phyla in stool from individually housed CNS1 (n = 9) and WT 
(n = 6) littermate mice. All data are representative of three or more 
independent experiments with =3 mice per group. Error bars show s.d. (a, ¢) or 
s.em. (d). *P< 0.05, **P<0.01, ***P < 0.001, as calculated by Student’s 
t-test. Scale bars, 150 um. 
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Figure 4 | Unprovoked asthma-like airway pathology in CNS1-deficient 
mice. a, Representative haematoxylin and eosin-stained lung sections from 
CNS1 (top) and WT (bottom) mice. The CNS1 lung has marked 
peribronchiolar inflammation (arrowhead). The reduced lumen (L) contains 
mucus produced by the hyperplastic respiratory epithelium (E). Arrows 
indicate reactive (top) and normal (bottom) endothelium. Bottom right hand 
comer insets are higher magnification of boxed regions and bar indicates 
smooth muscle thickness. Top right inset (KO) demonstrates eosinophilic 
crystals. Asterisk marks acidophilic macrophages. b, Periodic acid Schiff with 
Alcian Blue staining highlighting mucus-producing goblet cells (dark blue- 
purple) c, Trichrome staining illustrating lung fibrosis (blue staining). 

d, Arginase-1 staining of lungs from CNS1 and WT mice. A indicates airway; 
an acidophilic crystal is marked by the arrowhead. e, Chitinase 3-like 3 (Chi313) 
staining of lungs from CNS1 and WT mice at 10X magnification (top) and 
60X magnification of lungs from CNS1~ mice demonstrating robust Chi313 
expression within acidophilic macrophages (bottom). f, Lung resistance (left) 
and compliance (right) of CNS1 and WT littermate control mice after 
exposure to methacholine. Data representative of two independent 
experiments with =4 mice per group. Error bars, s.d.; *P < 0.05, **P < 0.01, 
*P < (001, as calculated by Students’ t-test. Scale bars, 100 jim. 


also contributes to the local cytokine environment, which facilitates 
iT reg cell differentiation and maintenance in the gut’. These observa- 
tions raise a question as to whether iT,,, cells, in turn, influence 
composition of the commensal microbiota. To address this question, 
we sequenced 16S ribosomal RNA coding genes from bacterial con- 
tents of stool samples isolated from CNS1” and CNS1™ littermates, 
which were housed individually for 5 weeks after weaning. 
Phylogenetic analysis revealed distinct gut microbial communities in 
CNS1~ mice, with statistically significant enrichment of the candidate 
phylum TM7 and the genus Bacteroidetes Alistipes (Supplementary 
Fig. 16), and an overall decrease in the ratio of Firmicutes to 
Bacteroidetes (2.60 in wild-type and 1.51 in knockout) (Fig. 3d). 
Interestingly, an opposite trend in the Firmicutes/Bacteroidetes ratio 
was correlated with obesity’, suggesting the possibility that alterations 
in energy harvest and metabolism (caused by inflammation or 
microbe-dependent effects on energy balance) could account for the 
decreased weight observed in iT, ¢¢ cell deficient mice. Thus, iT cg cells 
help maintain a ‘normal’ microbial community in the gut, probably 
through exerting control over T}2 mucosal inflammation. 

These observations raised the question of whether the altered micro- 
biota, rather than iT, eg deficiency, was the direct cause of observed Ty2 
inflammation. To equalize gut microbiota, CNS1~ and littermate 
controls were treated with antibiotics (metronidazole and ciprofloxa- 
cin) for 4weeks. Despite indistinguishable microbial communities, 
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antibiotic treatment did not lead to a decrease in Gata3 expression 
or Th2 cytokine production by effector T cells in CNS1~ mice, and 
characteristic histopathologic features were maintained (Supplemen- 
tary Fig. 17). Furthermore, iT,eg cell sufficient germ-free mice 
colonized with CNS1~ or control microbiota exhibited a similar 
spectrum of Ty1, Ty2 and Ty17 cytokine production and eventual 
normalization of microbiota (Supplementary Fig. 18 and data not 
shown). These results suggest that iT,.. deficiency results in immune 
dysregulation and Tj,2 inflammation in the gut with subsequent per- 
turbation of the microbial community. 

According to the notion of specialized iT,,., cell function in suppres- 
sion of T}2 responses at mucosal sites, one would expect to observe 
Ty2-type pathology in the lungs of CNS1 mice, despite an only 
modest ~20-25% decrease in numbers of T;eg cells in this tissue com- 
pared to littermate controls (Fig. 1c). Indeed, we discovered that 
CNS1~ mice suffer from spontaneous T}2-type airway inflammation 
(Fig. 4 and Supplementary Fig. 19). The lungs of CNS1” mice were 
characterized by increased infiltration by lymphocytes, plasma cells 
and macrophages, and by moderate neutrophil infiltration (Fig. 4). 
The consistent features of the chronic inflammatory airway disease 
observed in CNS1” mice include lymphocytic infiltration, narrowed 
airway lumen (Fig. 4a), increased goblet cells and mucus production 
(Fig. 4a and b), smooth muscle hyperplasia, and fibrosis (Fig. 4c). 
Notably, 9/12 CNS1~ and 0/6 CNS1™" mice developed acidophilic 
macrophage pneumonia (AMP) with characteristic increases in 
acidophilic macrophages and both intracellular and extracellular 
chitinase 3-like 3 crystals (Chi313, formerly Ym1), analogous to 
Charcott-Lyden crystals found in asthmatic patients (Fig. 4a and e). 
In addition, the prominent presence of alternatively activated macro- 
phages in the lungs of CNS1” mice was confirmed by morphology and 
expression of arginase 1 in addition to Chi313 (Fig. 4d and Sup- 
plementary Fig. 20). Furthermore, both young (6-8 week old) and 
aged (20 week old) CNS1 mice exhibited airway hyper-responsiveness 
accompanied by AMP, perivascular, peribronchiolar and intramucosal 
inflammation, bronchial epithelial hyperplasia, and airway narrowing 
(Fig. 4f and Supplementary Fig. 21). These spontaneous lesions are 
especially striking considering the T,2-resistant, T};1-prone C57BL/6 
genetic background of CNS1” mice. The lung pathology in CNS1 
mice reflects the hallmark features of chronic allergic inflammation 
and asthma. 

Our results demonstrate that Teg cells of thymic and extrathymic 
origin have distinct mechanistic requirements for differentiation and 
exert specialized functions in immune homeostasis. The restriction of 
lesions to mucosal tissues in iT,., deficient mice implies that under 
steady state conditions T,., cells generated in the thymus are largely 
sufficient for control of most immune responses to self-antigens. 
These findings suggest that in normal animals, T,., cells generated extra- 
thymically in a CNS1-dependent manner play a non-redundant role in 
control of mucosal allergic Th2 inflammation and asthma. 


METHODS SUMMARY 

The generation of the following mouse strains has been previously described””*: 
CNS1~ (Foxp34*"), Foxp3@"? and Foxp3°U? Ce R26Y. Rag] mice were 
purchased from The Jackson Laboratory, and CD45.1 B6 and Tcrb/Tcrd” mice, 
along with above strains were maintained in the Sloan Kettering Institute Research 
Laboratories animal facility in accordance with institutional regulations. Tissues for 
histologic analysis were fixed in 10% phosphate-buffered formalin and processed 
routinely for staining. In vitro induction assays were performed with 5 x 10* 
Foxp3-GFP- CD4* T cells and 5g ml of anti-CD3 and anti-CD28 antibody, 
100 U ml! IL-2, in 96-well, flat-bottom plates. For in vitro and transfer experi- 
ments, CD4* T cells were pre-enriched using mouse CD4 Dynabeads (L3T4, 
Invitrogen) and FACS sorted on an LSR-II (BD Biosciences). Intracellular staining 
for IL-4 used Cytofix/Cytoperm (BD Biosciences), and staining for other cytokines, 
Foxp3 and Gata3, used the Foxp3 staining kit (eBiosciences). For measurement of 
AHR, mice were anaesthetized with pentobarbitol and AHR was assessed by 
invasive measurement of airway resistance using modified version of a described 
method (Buxco Electronics). 16S rRNA sequencing was performed on a 454 GS 
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FLX Titanium pyrosequencing platform following the Roche 454 recommended 
procedures. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. The generation of the following mouse strains has been previously 
described": CNS1~  (Foxp3“°%*), Foxp3@? and Foxp3°hP Cr PR? RIG, 
Ragl mice were purchased from The Jackson Laboratory, and CD45.1 B6 and 
Tcrb/Tcrd '~ mice, along with above strains were maintained in the Sloan Kettering 
Institute Research Laboratories animal facility in accordance with institutional 
regulations. Mice were killed by CO2 asphyxiation. EAE was induced and scored 
as previously described*’. For antibiotic treatment, CNS1-deficient and sufficient 
mice were treated with 1gl~' metronidazole (Sigma-Aldrich) and 0.2g1 ' 
ciprofloxacin (ENZO Life Sciences International) dissolved in drinking water for 
4weeks. Mouse anti-CD20* (MB20-11, provided by T. Tedder) and anti-IL-4 
(11b.11, NCI-Frederick) were administered weekly as intraperitoneal injections 
of 50 ug or 5 ug, respectively, for 3 weeks. 

Cell isolation, transfer and FACS staining. For in vitro and in vivo transfer 
experiments, CD4* T cells were pre-enriched using mouse CD4 Dynabeads 
(L3T4, Invitrogen) and FACS sorted on an LSR-II (BD Biosciences). Intracellular 
staining for IL-4 used Cytofix/Cytoperm following treatment with Golgi-Stop (BD 
Biosciences), and staining for other cytokines (following treatment with Golgi-Plug, 
BD Biosciences) and Foxp3 and Gata3 used the Foxp3 staining kit (eBiosciences). 
In vitro assays. In vitro induction assays were performed with 5 X 10* Foxp3- 
GEP- CD4* T cells and 5 ug ml! of anti-CD3 and anti-CD28 antibody, 
100U ml ' IL-2, in 96-well, flat-bottom plates. For in vitro suppression assays, 
4X 10‘ CD4* Foxp3” CD62L#" naive T cells FACS purified from WT mice were 
cultured with graded numbers of CD4*Foxp3* T,. cells FACS purified from 
Foxp34-%*! or Foxp3? mice in the presence of 10° irradiated T cell-depleted 
splenocytes and 1 pgml' anti-CD3 antibody in a 96-well round-bottom plate 
for 80h. Cell proliferation was assessed by [*H]thymidine incorporation during 
the final 8h of culture. 

Histology and immunohistochemistry. Necropsies were performed, and 
sections of pancreas, stomach, heart, lungs, kidney, external ear and haired skin 
were fixed in 10% phosphate-buffered formalin. Tissues were processed routinely 
for staining with haematoxylin and eosin, periodic acid Schiff with Alcian blue or 
Masson Trichrome if indicated. Slides were examined by an American Board of 
Veterinary Practitioners-certified veterinary pathologist blinded to genotypes. 
Morphological diagnoses were applied for all tissues. Inmunohistochemical stain- 
ing was performed by the University of Washington Histology and Imaging Core 
using standard protocols with a Leica Bond Automated Immunostainer. Primary 
antibodies: goat anti-mouse chitinase 3-like 3/ECF-L (YM1) (R&D systems, cat. 
no. AF2446, lot no. UNU01), 0.2 ug ml ~ |. rabbit polyclonal anti iNOS/NOS II, NT 
(Millipore, cat. no. 06-573), 1 ug ml}; rabbit polyclonal anti arginase 1 (H-52) 
(Santa Cruz, cat. no. sc-20150, lot no. K0807), 0.2 ug ml — a Isotype controls were 
used at the same concentration as the primary antibody with all antibodies run 
with Lecia Bond reagents and Bond Polymer Refine (DAB) detection with 
haematoxylin counter stain. 

Histology inflammation scoring. 0, None; 1, focal or multifocal mild perivascular 
accumulations with minimal extension into surrounding adventia or parenchyma; 
2, multifocal mild or focal moderate perivascular accumulations with mild exten- 
sion into surrounding parenchyma or mild to moderate parenchymal accumula- 
tions; 3, grade 2 plus mild inflammation-associated parenchymal lesions such as 
loss or degeneration of cells; 4, grade 2 plus moderate to severe inflammation- 
associated parenchymal lesions. Inflammation in the gastrointestinal tract was 
scored as described previously”. 

Airway hyperresponsiveness measurements. For measurement of AHR, mice 
were anaesthetized with pentobarbitol (7.5-10mg per mouse) and AHR was 
assessed by invasive measurement of airway resistance using modified version 
of a described method (Buxco Electronics). Mice were ventilated at a tidal volume 
of 0.2 ml with the use of a ventilator (Harvard Apparatus) and frequency was set 
around 150 Hz. Baseline pulmonary mechanics and responses to ventilated saline 
(0.9% NaCl) were measured, and lung resistance (R,) was measured in response 
to increasing doses (0.125-40mgml') of acetyl-B-methylcholine chloride 
(methacholine; MCh) (Sigma-Aldrich). The three values of R;, obtained after each 
dose of methacholine were averaged to obtain the final values for each dose. 
Results are expressed as percentage of increase of saline-baseline. Following 
measurement of AHR, mouse tracheas were cannulated and the lungs were 
lavaged twice with 1 ml of PBS 2% FCS and the fluids were pooled. Cells in the 
lavage fluid were counted using a haemocytometer, and BAL cell differential 
counts were determined on slide preparations stained with DiffQuik. At least 
200 cells were differentiated on stained slides by light microscopy using conven- 
tional morphological criteria. For some experiments, BAL for each mouse or 
grouped BAL was stained and analysed by flow cytometry. 

Stool sample collection. Fresh stool samples were induced directly into sterile 
collection tubes from live CNS1 and control mice and snap frozen before 
preparation of material for sequencing (see below). 


DNA extraction. DNA extraction was performed on each fecal specimen using 
phenol-chloroform extraction with mechanical disruption based on a previously 
described protocol’? Briefly, an aliquot (~500 mg) of each sample was suspended 
in a solution containing 500 1] of extraction buffer (200 mM Tris, pH 8.0; 200 mM 
NaCl; and 20mM EDTA), 210 pl of 20% SDS, 500 ul of phenol/chloroform/ 
isoamyl alcohol (25:24:1), and 500 ul of 0.1-mm-diameter zirconia/silica beads 
(BioSpec Products). Microbial cells were lysed by mechanical disruption with a 
bead beater (BioSpec Products) for 2 min, after which two rounds of phenol/ 
chloroform/isoamyl alcohol extraction were performed. DNA was precipitated 
with ethanol and resuspended in 50 pil of nuclease-free water. DNA was subjected 
to additional purification with the QlAamp DNA Mini Kit (Qiagen). 

PCR amplification and sequencing. For each sample, three replicate 25 tl PCR 
amplifications were performed, each containing 5 ng of purified DNA, 0.2mM 
dNTPs, 1.5mM MgCh, 1.25 U Platinum Taq DNA polymerase, 2.5 pl of 10OX PCR 
buffer, and 0.2 uM each of broad-range bacterial forward and reverse primers as 
described previously*’, flanking the V1-V3 variable region. The primers were 
modified to include adaptor sequences required for 454 sequencing, with the 
addition of a unique 6-8 base barcode in the reverse primer. The forward primer 
(5'-CCTATCCCCTGTGTGCCTTGGCAGTCTCAGAGTITGATCCTGGCTC 
AG-3') consisted of the 454 Lib-L primer B (underlined) and the broad-range 
universal bacterial primer 8F (italics); the reverse primer (5’-CCATCTCATCCC 
TGCGTGTCTCCGACTCAGNNNNNNNATTACCGCGGCTGCTGG-3') con- 
sisted of the 454 Lib-L primer A, barcode (NNNNNNN), and the broad-range 
primer 534R (italics). The cycling conditions were: 94 °C for 3 min, then 25 cycles 
of 94°C for 30s, 56°C for 30s, and 72°C for 1 min. The three replicate PCR 
products were pooled and subsequently purified using the Qiaquick PCR 
Purification Kit (Qiagen). The purified PCR products were sequenced unidirec- 
tionally on a 454 GS FLX Titanium pyrosequencing platform following the Roche 
454 recommended procedures. 

Sequence processing and analysis. Sequences were converted to standard FASTA 
format using Vendor 454 software. Sequences shorter than 200 base pairs (bp), 
containing undetermined bases or homopolymer stretches longer than 8 bp, or 
failing to align with the V1-V3 region were excluded from the analysis. Using the 
454 base quality scores, which range from 0 to 40 (0 being an ambiguous base), 
sequences were trimmed using a sliding-window technique, such that the minimum 
average quality score over a window of 50 bases never dropped below 35. Sequences 
were trimmed from the 3’-end until this criterion was met. Sequences were aligned 
to the V1-V3 region of the 16S gene, using as template the SILVA reference 
alignment** and the Needleman-Wunsch algorithm with default scoring options. 
Potentially chimaeric sequences were removed using the chimaera uchime pro- 
gram’*. Sequences were grouped into operational taxonomic units (OTUs) using 
the average neighbour algorithm. Sequences with distance-based similarity of 97% 
or greater were assigned to the same OTU. For each fecal sample, OTU-based 
microbial diversity was estimated by calculating the Shannon diversity index”’. 
Phylogenetic classification to genus level was performed for each sequence, using 
the Bayesian classifier algorithm described by Wang and colleagues, using a 
database of known 16S sequences generated by the Ribosomal Database Project 
(RDP)**. For each experiment, data were analysed on each taxon level individu- 
ally. The count data was rescaled using DESeq R package’’. Bacteria with less than 
10 mean count in both conditions were removed from further analysis and bac- 
teria with statistically significant differences between two conditions (for example, 
WT and KO), were determined using binomial test (from DESeq package). 
Bacteria with fold-change greater than two and FDR=0.05 were declared 
significant. 
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Cancer exome analysis reveals a T-cell-dependent 
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Cancer immunoediting, the process by which the immune system 
controls tumour outgrowth and shapes tumour immunogenicity, 
is comprised of three phases: elimination, equilibrium and 
escape’*. Although many immune components that participate 
in this process are known, its underlying mechanisms remain 
poorly defined. A central tenet of cancer immunoediting is that 
T-cell recognition of tumour antigens drives the immunological 
destruction or sculpting of a developing cancer. However, our 
current understanding of tumour antigens comes largely from 
analyses of cancers that develop in immunocompetent hosts and 
thus may have already been edited. Little is known about the 
antigens expressed in nascent tumour cells, whether they are suf- 
ficient to induce protective antitumour immune responses or 
whether their expression is modulated by the immune system. 
Here, using massively parallel sequencing, we characterize 
expressed mutations in highly immunogenic methylcholanthrene- 
induced sarcomas derived from immunodeficient Rag2~’~ mice 
that phenotypically resemble nascent primary tumour cells’*”. 
Using class I prediction algorithms, we identify mutant spectrin-B2 
as a potential rejection antigen of the d42m1 sarcoma and validate 
this prediction by conventional antigen expression cloning and 
detection. We also demonstrate that cancer immunoediting of 
d42m1 occurs via a T-cell-dependent immunoselection process that 
promotes outgrowth of pre-existing tumour cell clones lacking highly 
antigenic mutant spectrin-B2 and other potential strong antigens. 
These results demonstrate that the strong immunogenicity of an 
unedited tumour can be ascribed to expression of highly antigenic 
mutant proteins and show that outgrowth of tumour cells that lack 
these strong antigens via a T-cell-dependent immunoselection pro- 
cess represents one mechanism of cancer immunoediting. 

For this study, we chose two representative, highly immunogenic, 
unedited methylcholanthrene (MCA)-induced sarcoma cell lines, 
d42m1 and H31m1, derived from immunodeficient Rag2 ‘~ mice’. 
Both ow progressively when transplanted orthotopically into 
Rag2 ° mice, but are rejected when transplanted into naive wild-type 
mice (Supplementary Figs 1 and 2). Using a modified form of exome 
sequencing involving complementary DNA (cDNA) capture by 
mouse exome probes and Illumina deep sequencing (that is, CDNA 
capture sequencing or cDNA CapSeq), we identified 3,737 somatic, 
non-synonymous mutations in d42m1 cells (3,398 missense, 221 non- 
sense, 2 nonstop and 116 splice site mutations) and 2,677 non- 
synonymous mutations in H31m1 cells (2,391 missense, 160 nonsense, 
3 nonstop and 123 splice site mutations) (Fig. la and Supplementary 
Fig. 3 and Supplementary Table 1). The mutations in each cell line 


were largely distinct—d42m1 and H3lm1 share only 119 identical 
missense mutations (Fig. 1b and Supplementary Table 2)—a result 
that potentially explains the unique antigenicity of each cell line 
(Supplementary Fig. 4). Although d42m1 and H31m1 display muta- 
tions in known cancer genes®, the functional effects of these novel 
mutations remain undefined. Nevertheless, both tumours have can- 
cer-causing mutations in Kras (codon 12) and Trp53 that are fre- 
quently observed in human and mouse cancers’” (Supplementary 
Table 3). The mutation calls were confirmed by independent Roche/ 
454 pyrosequencing of 22 genes using tumour genomic DNA and by 
documenting their absence in normal cells from the same mouse that 
developed the tumour (Supplementary Table 4). 

Comparing cDNA CapSeq data of d42m1 and H3lm1 cells to 
human cancer genomes'®”” revealed two similarities. First, 46-47% 
of mutations in d42m1 and H3lml1 are C/A or G/T transversions, 
which represent chemical-carcinogen signatures”’** similar to those 
of lung cancers from smokers (44-46%) but not seen in human cancers 
induced by other mechanisms (8-16%) (Fig. 1c). Second, the mutation 
rates of d42m1 and H31m1 are about tenfold higher than those of lung 
cancers from smokers, but within threefold of hypermutator smoker 
lung cancers with mutations in DNA repair pathway genes (Fig. 1d). 
Interestingly, d42m1 and H31m1 also show mutations in DNA repair 
genes (Supplementary Table 3), although these novel mutations have 
not been functionally characterized. Thus, mouse MCA-induced 
sarcomas have qualitative and quantitative genomic similarities to 
carcinogen-induced human cancers. 

When parental d42m1 sarcoma cells were transplanted into naive 
wild-type mice, approximately 20% of recipients developed escape 
tumours (Supplementary Fig. 5a, c). Cell lines made from three escape 
tumours (d42m1-esl, d42m1-es2 and d42m1-es3) formed progres- 
sively growing sarcomas when transplanted into naive wild-type 
recipients (Fig. 2a). In contrast, parental d42m1 tumour cells passaged 
through Rag2~‘~ mice maintained high immunogenicity (Sup- 
plementary Fig. 5b, d). Additional analyses revealed that whereas eight 
of ten clones of d42m1 were rejected in wild-type mice, two clones 
(d42m1-T3 and d42m1-T10) grew with kinetics similar to d42m1 
escape tumours (Fig. 2a and Supplementary Fig. 6). Thus, the 
d42m1 cell line consists mostly, but not entirely, of highly immuno- 
genic clones and undergoes immunoediting in wild-type mice. cDNA 
CapSeq of parental d42m1 cells, clones and escape tumours revealed 
that all expressed similar numbers of mutations (Supplementary 
Fig. 7a and Supplementary Table 1) and phylogenetic analysis revealed 
that all d42m1-derived cells were genomically related to one another 
but distinct from H3lm1 and normal fibroblasts (Supplementary 
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Figure 1 | Unedited MCA-induced sarcomas d42m1 and H31m1 
genomically resemble carcinogen-induced human cancers. a, Number of 
non-synonymous mutations in d42m1 and H31m1 tumour cells as detected by 
cDNA CapSeq. SNV, single nucleotide variant. b, Missense mutations 
compared between d42m1 and H31m1 that had at least 20x sequencing 
coverage. c, Spectrum of DNA nucleotide substitutions detected in d42m1 and 
H31m1 as compared to previously generated data from human cancers 
including acute myelogenous leukaemia’? (AML), chronic lymphocytic 
leukaemia’? (CLL), breast cancer (breast lobular’’, breast basal’’), ovarian 
cancer (E. R. Mardis et al., manuscript in preparation), liver cancer (hepatitis C 
virus (HCV)-positive)'°, melanoma (ultraviolet (UV)-induced)"’ and lung 
cancers (non-small cell (NSC)", small cell (SC)"*, never-smoker, smoker and 
hypermutator (E. R. Mardis et al., manuscript in preparation). d, Mutation 
rates for d42m1, H31m1 and human cancers described in c including tumours 
from never-smoker 1 (bronchioloalveolar carcinoma) and never-smoker 2 
(lung adenocarcinoma). 


Fig. 7b). However, regressor clones clustered more closely to parental 
d42m1 cells whereas progressor clones clustered more closely to cells 
from escape tumours. Thus, the d42m1 tumour cell line consists of a 
related, but heterogeneous population of tumour cells. 

Tumour-specific mutant proteins presented on mouse or human 
MHC class I molecules are known to represent one class of tumour- 
specific antigens for CD8* T cells!*"°. Therefore, we used in silico 
analysis” to assess the theoretical capacities of missense mutations 
from d42m1-related tumour cells to bind MHC class I proteins. 
Each d42m1-related cell type expressed many potential high-affinity 
(half-maximum inhibitory concentration (IC;9)<50nM; affinity 
value j(/ICso xX 100) >2) epitopes that could bind to H- 2D° or 
H-2K° (Fig. 2b). Of these, 39-42 were expressed only in the regressor 
subset of d42m1-related cells (7-9 for H-2D>, 30-35 for H-2K°), 
including 31 expressed in all regressor cells (Supplementary Table 5). 
Thus, ~1% of the missense mutations in d42m1 are selectively 
expressed in rejectable d42m1 clones. 


LETTER 


Whereas parental and regressor d42m1 cells stimulated interferon-y 
(IFN-7) release in vitro when incubated with a specific CDs* cytotoxic 
T lymphocyte (CTL) clone (C3) derived from a wild-type mouse that 
had rejected parental d42m1 tumour cells (Fig. 3a, b), progressor 
d42m1 clones, cells from escape tumours or unrelated MCA sarcomas 
did not. This result demonstrated that all regressor d42m1 tumour cells 
share a mutation that forms the epitope recognized by C3 CTLs. As 
recognition of d42m1 regressor cells by C3 CTLs is restricted by H-2D° 
(Fig. 3c), we postulated that an R913L mutation in spectrin-B2 pro- 
duced the most likely target for C3 CTLs because its expression was 
restricted to d42m1 regressor clones and it formed an epitope that 
showed high-affinity binding potential to H-2D° in contrast to the 
wild-type sequence predicted to bind with low affinity (Fig. 3d and 
Supplementary Table 5). 

To verify the importance of mutant spectrin-82 on d42m1 anti- 
genicity, we independently identified the tumour antigen recognized 
by the C3 CTL clone using a T-cell-based expression cloning 
approach”’. After three screening rounds, a single positive cDNA 
was identified encoding a sequence identical to the R913L spectrin-B2 
mutant (Fig. 3e). Thus, conventional antigen expression cloning iden- 
tified the same mutation predicted by the genomic sequencing. 

Mutation-specific real-time quantitative polymerase chain reaction 
with reverse transcription (qRT-PCR) revealed the presence of mutant 
spectrin-82 messenger RNA in parental d42m1 tumour cells and 
regressor d42m1 clones, but not in progressor d42m1 clones or escape 
tumours (Fig. 3f), nor in normal tissue of the mouse from which the 
d42m1 tumour was derived (Supplementary Table 4 and Sup- 
plementary Fig. 8). Additionally, C3 CTLs discriminated between 
mutant and wild-type spectrin- B2 peptide sequences when presented 
on an unrelated H-2D°-expressing cell line (Fig. 3g). Whereas the 
mutant (VAVVNQIAL; underline letter indicates the site of mutation) 
peptide stimulated C3 CTLs in a dose-dependent manner, the wild-type 
(VAVVNQIAR) peptide did not, even when added in 1,000-fold excess. 
Using labelled H-2D° tetramers generated with mutant peptide, mutant 
spectrin-B2-specific CD8* T cells accumulated over time in parental 
d42m1 tumours developing in vivo and draining lymph nodes before 
tumour rejection (Fig. 4a, b). In contrast, no mutant spectrin-B2- 
specific CD8* T cells were detected in progressively growing escape 
tumours or draining lymph nodes. These data demonstrate that mutant 
spectrin-B2 expressed selectively in a high proportion of unedited 
d42m1 tumour cells evokes a T-cell response in naive wild-type mice 
that promotes the elimination of antigen-expressing tumour cells. 

To test whether expression of mutant spectrin-}2 was sufficient to 
drive rejection of d42m1 tumour cells, we enforced expression of either 
mutant or wild-type spectrin-B2 in d42m1-es3 cells that lack this 
mutation (Supplementary Fig. 9a) and followed their growth in 
wild-type mice. Whereas d42m1-es3 tumour cell clones transduced 
with either control retrovirus or retrovirus encoding wild-type spectrin- 
62 (WT.1 and WT.3) grew progressively with growth kinetics similar to 
unmanipulated d42m1-es3 cells, d42m1-es3 clones expressing mutant 
spectrin- B2 (mu.6 and mu.14) were rejected in wild-type mice, but not 
in Rag2’~ mice (Fig. 4c and Supplementary Fig. 9b, c, d). CD8* T cells 
specific for mutant spectrin-B2 did not infiltrate d42m1-es3 tumours 
expressing wild-type spectrin-$2 (WT.3), but were present in d42m1- 
es3 tumours expressing mutant spectrin-B2 (mu.14) that were rejected 
in wild-type mice (Fig. 4d). Thus, mutant spectrin-B2 is indeed a major 
rejection antigen of d42m1 sarcoma cells and d42m1 escape from 
immune control is the consequence of outgrowth of d42m1 clones that 
lack expression of dominant rejection antigens. 

The possibility that the lack of dominant rejection antigen(s) in a 
small subset of d42m1 cells was due to epigenetic silencing was ruled 
out because no spectrin-B2 mutation was (1) found by sequencing 
genomic DNA from progressor d42m1 clones or escape tumours 
(Supplementary Table 4) or (2) expressed in d42m1 progressor clones 
or escape tumours after treatment with inhibitors of methyltrans- 
ferases and histone deacetylases (Supplementary Fig. 10). We therefore 
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asked whether T-cell-dependent immunoselection explained the out- 
growth of escape tumours. Specifically, we examined the in vivo 
growth behaviour of a tumour cell mixture containing a vast majority 
of highly immunogenic, mutant spectrin-B2* d42m1-T2 cells and a 
minority of mutant spectrin-B2~ d42m1-T3 progressor cells. To dis- 
tinguish between the two cell types, we labelled d42m1-T2 with red 
fluorescent protein (RFP) (modified to eliminate class I epitopes) and 
d42m1-T3 with green fluorescent protein (GFP) and documented that 
the labelling did not alter their in vivo growth characteristics. We 
found that we could recapitulate the tumour growth phenotype of 
parental d42ml at a ratio of 95% d42m1-T2 cells to 5% d42m1-T3 cells 
(Fig. 4e). At this ratio, 100% of Rag2-/~ mice and wild-type mice 
depleted of either CD4* or CD8* T cells developed progressively 
growing tumours (Fig. 4f). In contrast, 5/20 (25%) wild-type mice 
injected with the tumour cell mixture developed escape tumours, a 
result that recapitulated the behaviour of parental d42m1. Tumours 
harvested from Rag2~‘~ mice were comprised of 84% d42m1-T2 cells 
and 14% d42m1-T3 cells (Fig. 4h) and expressed mutant spectrin-B2 
(Fig. 4g), that is, they resembled the initial 95:5 cell mixture. In con- 
trast, tumours that grew out in wild-type mice consisted of 98% 
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d42m1-T3 tumour cells and lacked mutant spectrin-B2 (Fig. 4g, h). 
Thus, d42m1 escape tumours develop as a consequence of T-cell- 
dependent immunoselection favouring the outgrowth of tumour cells 
that lack major rejection antigens. 

This report shows that the combination of cancer exome sequencing 
and in silico epitope prediction algorithms can identify highly 
immunogenic, tumour-specific mutational antigens in unedited 
carcinogen-induced cancers that serve as targets for the elimination 
phase of cancer immunoediting. To our knowledge, this is the first study 
to use a genomics approach to experimentally identify a tumour anti- 
gen, to specifically identify an antigen from an unedited tumour and to 
demonstrate that T-cell-dependent immunoselection is a mechanism 
underlying the outgrowth of tumour cells that lack strong rejection 
antigens. This mechanism most likely also produces other types of 
escape tumours, such as those that develop inactivating mutations in 
antigen presentation genes (for example, those encoding MHC class I 
proteins), which are frequently observed in clinically apparent human 
cancers’. Developing carcinogen-induced tumours (for example, 
mouse MCA sarcomas or human smoker lung cancers) may be the 
preferred targets of cancer immunoediting because they express the 
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Figure 3 | Identification of mutant spectrin-f2 as an authentic antigen of an 
unedited tumour. a, b, IFN-y release by C3 CTLs following co-culture with 
different unedited sarcomas (a) or d42m1-related tumours (b). c, IFN-y release 
by C3 CTLs is inhibited by monoclonal antibodies that block CD8 and H-2D°, 
but not CD4 or H-2K°. d, MHC class I epitopes predicted to be shared in all of 
the regressor d42m1 tumours, but not in progressor d42m1 tumours. 

e, Representation of the cDNA clone that stimulated C3 CTLs encoding the 
spectrin-B2 R913L mutation. f, GRT-PCR for mutant spectrin-§2 in d42m1- 
related tumours and 1773. g, IFN-y release by C3 CTLs incubated with COS-D? 
cells pulsed with wild-type (circles) or mutant (squares) spectrin-82 peptides. 
Data are representative of three independent experiments. Samples were 
compared in b, f to d42m1 using an unpaired, two-tailed Student’s f test 

(*P < 0.05, **P < 0.01, ***P < 0.001; NS, not significant). 


greatest number of mutations that might function as neoantigens. 
However, as ~1% of the mutations in d42m1 are selectively expressed 
in regressor tumour clones, it is possible that spontaneous tumours 
arising by other means that harbour as few as 100-200 mutations could 
still be susceptible to immunological sculpting as they develop. In this 
regard it is significant that, as documented in a complementary study 
reported in this issue, oncogene-induced primary sarcomas engi- 
neered to express a strong model antigen can also undergo T-cell- 
dependent immunoediting, resulting in the outgrowth of tumours that 
escape immune control. It will be interesting in the future to compare 
the effects of immunity on the antigenic profiles of oncogene- versus 
carcinogen-induced tumours. 

The immunodominance of mutant spectrin-§2 in driving tumour 
rejection in many ways resembles that of certain viral antigens” and is 
probably due to the presence in d42m1 of four copies of chromosome 11, 
each of which carries the spectrin-B2 gene, thereby producing a highly 
abundant neoepitope that binds to H-2D? 750-fold stronger than that 
of the wild-type sequence. More work is needed to determine which of 
the other mutations, if any, selectively expressed in d42m1 regressors 
function as rejection antigens. Immunoepitope analysis of parental 
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Figure 4 | Mutant spectrin-B2 is a major rejection antigen of d42m1. 

a, Mutant spectrin-B2-specific CD8* T cells were detected by tetramer staining in 
tumours and draining lymph nodes (DLNs) from mice challenged with d42m1 
parental cells, but not d42m1-es3 cells on day 11 post-transplant. APC, 
allophycocyanin; PE, phycoerythin. b, Quantification and kinetics of mutant 
spectrin-{2 tetramer staining in mice challenged with d42m1 parental cells (n = 3, 
circles) or d42m1-es3 cells (n = 3, squares). c, Growth of d42m1-es3 tumour cell 
clones transduced with wild-type (n = 5, squares) or mutant spectrin-B2 (n = 5, 
circles) and control d42m1-es3 cells (n = 5, triangles) after transplantation 

(1 X 10° cells) into wild-type mice. Data are presented as average tumour 
diameter + s.e.m. d, d42m1-es3 tumours reconstituted with wild-type (WT.3) or 
mutant spectrin-$2 (mu.14) were harvested at day 11 and CD80" T cells were 
stained with mutant spectrin-B2 tetramers. e, Growth of a mixture of d42m1-T2- 
RFP (95%) and of d42m1-T3-GFP (5%) after transplantation (1 X 10° total cells) 
into wild-type (n = 5, solid lines, closed squares) or Rag?! ~ (n = 2, dashed lines, 
open squares) mice. f, Tumour outgrowth in Rag2 “~ or wild-type (WT) mice 
treated or untreated with monoclonal antibodies that deplete CD4~ or CD8* T 
cells after challenge with 1 X 10° cells of a d42m1 mixture (95% d42m1-T2-RFP 
and 5% d42m1-T3-GFP). Data are presented as per cent tumour positive mice 
from 2-4 independent experiments (n = 2-5 mice per group). g, h, GFP and RFP 
expression (g) and mutant spectrin-f2 expression (h) were analysed in the 
d42m1-T2-RFP/d42m1-T3-GFP tumour cell mixture before injection and from 
tumours that grew out in Rag2 ‘~ mice (RagPass) or escaped in wild-type mice by 
flow cytometry (g) or qRT-PCR (h). Data are representative of two independent 
experiments. Samples were compared using an unpaired, two-tailed Student’s 
t-test (*P < 0.05, **P < 0.01, ***P < 0.001; NS, not significant). 


H31m1 reveals that it expresses multiple potential strong neoantigens 
(19 potential strong binders to H-2D° and 58 to H-2K?) (Sup- 
plementary Fig. 11a) and induces both H-2D°- and H-2K?-restricted 
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CD8* T-cell responses during rejection (Supplementary Fig. 11b). 
This result suggests that H31m1 shows an even more complex anti- 
genicity than d42m1 and probably explains why H31m1 never pro- 
duces escape tumours in wild-type mice (Supplementary Fig. 1 1c). 

Chemically induced tumours have had a critical role in the history of 
tumour immunology, providing the first unequivocal demonstration 
of tumour-specific antigens*°”” and, subsequently, the first evidence of 
cancer immunoediting’~. It is therefore significant that this same 
model has now provided new insights into the antigenic targets of 
cancer immunoediting and some of the key molecular mechanisms 
that drive the process. Although more work is needed to determine 
whether and how frequently this process occurs during development 
of spontaneous and carcinogen-induced human cancers, it is tempting 
to speculate that a genomics approach to tumour antigen identification 
could, in the future, facilitate the development of individualized cancer 
immunotherapies directed at tumour-specific—rather than cancer- 
associated—antigens. 


METHODS SUMMARY 


d42m1 and H3lm1 MCA-induced sarcomas were generated in male 129/Sv 
Rag2 ’ ~ mice as previously described’. Total RNA was isolated from low-passage 
MCA-induced sarcoma cell lines and skin fibroblasts from male 129/Sv Rag2/~ 
mice using the RNeasy Mini kit (Qiagen) and cDNA was prepared using oligo (dT) 
primers and SuperScript II Reverse Transcriptase (Invitrogen). lumina libraries 
prepared with this cDNA were hybridized to biotinylated Agilent mouse exome 
probes. Library components were captured using strepavidin-coated magnetic 
beads (DynaBeads), PCR amplified and sequenced using an Illumina GAIIx ana- 
lyser (CDNA CapSeq). Putative somatic mutations were identified using VarScan 2 
(v.2.2.4). Missense mutations were analysed for potential neoepitope binding to 
MHC class I using an algorithm” available at Immune Epitope Database and 
Analysis Resource (http://www.immuneepitope.org) and were expressed as affin- 
ity values (reciprocal of the predicted ICs» multiplied by 100). 

All tumour cell lines were injected subcutaneously in the flank of naive syn- 
geneic male mice (1 X 10° cells). Ten d42m1 tumour cell clones were isolated from 
the parental cell line by limiting dilution. Escape tumours of d42m1 were harvested 
from tumours growing in wild-type mice and cell lines were produced. To generate 
the C3 d42m1-specific CTL clone, splenocytes from a mouse that rejected d42m1 
were harvested, stimulated with parental d42m1 target cells pre-treated with 
100U ml IFN-y for 48h and irradiated with 100 Gy and cloned by limiting 
dilution. To clone the antigen recognized by the C3 CTL clone, a d42m1 cDNA 
library was cloned into pcDNA3 (Invitrogen), transfected into COS cells expres- 
sing mouse H-2D°, and screened for C3 reactivity by IFN-y ELISA (eBioscience). 
Mutant spectrin-f2 expression was detected by RT-PCR using mutation-specific 
primers. H-2D? tetramers were generated with 905-913 mutant spectrin-B2 pep- 
tides by the NIH Tetramer Facility (Emory). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 


Received 17 August; accepted 2 December 2011. 
Published online 8 February 2012. 


1. Shankaran, V. et a/. IFNy and lymphocytes prevent primary tumour development 
and shape tumour immunogenicity. Nature 410, 1107-1111 (2001). 

2. Dunn, G.P., Bruce, A. T., Ikeda, H., Old, L. J. & Schreiber, R. D. Cancer 
immunoediting: from immunosurveillance to tumor escape. Nature Immunol. 3, 
991-998 (2002). 

3. Koebel, C. M. et al. Adaptive immunity maintains occult cancer in an equilibrium 
state. Nature 450, 903-907 (2007). 

4. Vesely, M. D., Kershaw, M. H., Schreiber, R. D. & Smyth, M. J. Natural innate and 
adaptive immunity to cancer. Annu. Rev. Immunol. 29, 235-271 (2011). 

5. Schreiber, R. D., Old, L. J. & Smyth, M. J. Cancer immunoediting: integrating 
immunity’s roles in cancer suppression and promotion. Science 331, 1565-1570 
(2011). 

6. Futreal, P.A.eta/. Acensus of human cancer genes. Nature Rev. Cancer 4, 177-183 
(2004). 

7. Chen,A.C.& Herschman, H. R. Tumorigenic methylcholanthrene transformants of 
C3H/10T1/2 cells have a common nucleotide alteration in the c-Ki-ras gene. Proc. 
Nat! Acad. Sci. USA 86, 1608-1611 (1989). 


404 | NATURE | VOL 482 | 16 FEBRUARY 2012 


8. Tuveson, D. A. et a/. Endogenous oncogenic K-ras(G12D) stimulates proliferation 
and widespread neoplastic and developmental defects. Cancer Cel! 5, 375-387 
(2004). 

9. Kirsch, D. G. etal. A spatially and temporally restricted mouse model of soft tissue 
sarcoma. Nature Med. 13, 992-997 (2007). 

10. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid 

eukaemia genome. Nature 456, 66-72 (2008). 

11. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and 

xenograft. Nature 464, 999-1005 (2010). 

12. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single 

nucleotide resolution. Nature 461, 809-813 (2009). 

13. Lee,W.eta/. The mutation spectrum revealed by paired genome sequences froma 

ung cancer patient. Nature 465, 473-477 (2010). 

14. Pleasance, E. D. et al. Asmall-cell lung cancer genome with complex signatures of 

‘obacco exposure. Nature 463, 184-190 (2010). 

15. Totoki, Y. et a/. High-resolution characterization of a hepatocellular carcinoma 
genome. Nature Genet. 43, 464-469 (2011). 

16. Puente, X. S. et al. Whole-genome sequencing identifies recurrent mutations in 
chronic lymphocytic leukaemia. Nature 475, 101-105 (2011). 

17. Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a 

human cancer genome. Nature 463, 191-196 (2010). 

18. Boon, T., Coulie, P. G., Van den Eynde, B. J. & van der Bruggen, P. Human T cell 

responses against melanoma. Annu. Rev. Immunol. 24, 175-208 (2006). 

19. Segal, N.H. etal. Epitope landscape in breast and colorectal cancer. Cancer Res. 68, 

889-892 (2008). 

20. Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with 

novel sequence representations. Protein Sci. 12, 1007-1017 (2003). 

21. van der Bruggen, P. et al. A gene encoding an antigen recognized by cytolytic T 

lymphocytes on a human melanoma. Science 254, 1643-1647 (1991). 

22. Khong,H.T. & Restifo, N. P. Natural selection of tumor variants in the generation of 

“tumor escape” phenotypes. Nature Immunol. 3, 999-1005 (2002). 

23. Dunn, G.P., Old, L.J.& Schreiber, R. D. The three Es of cancer immunoediting. Annu. 

Rev. Immunol. 22, 329-360 (2004). 

24. DuPage, M., Mazumdar, C., Schmidt, L. M., Cheung, A. F. & Jacks, T. Expression of 

umour-specific antigens underlies cancer immunoediting. Nature doi:10.1038/ 
nature10803 (this issue). 

25. Yewdell, J. W. Confronting complexity: real-world immunodominance in antiviral 
CD8* T cell responses. Immunity 25, 533-543 (2006). 

26. Prehn, R. T. & Main, J. M. Immunity to methylcholanthrene-induced sarcomas. 

J. Natl. Cancer Inst. 18, 769-778 (1957). 

27. Old, L. J. & Boyse, E. A. Immunology of experimental tumors. Annu. Rev. Med. 15, 

167-186 (1964). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements We are grateful to J. Archambault for expert technical assistance, 
T. H. Hansen (Washington University) for providing MHC class | antibodies, S. Horvath 
and P. M. Allen (Washington University) for synthesizing MHC class | peptides, the 
National Institutes of Health (NIH) Tetramer Core Facility for producing MHC class | 
tetramers, and T. S. Stappenbeck (Washington University) for technical help in 
recovering frozen tumour samples. We also thank E. Unanue, P. M. Allen and J. Bui for 
criticisms and comments, all members of the Schreiber laboratory for discussions, and 
the many members of The Genome Institute at Washington University School of 
Medicine, especially L. Ding for her insights into our analytical approaches. This work 
was supported by grants to R.D.S. from the National Cancer Institute, the Ludwig 
Institute for Cancer Research, the Cancer Research Institute, and the WWWW 
Foundation; and to E.R.M. from the National Human Genome Research Institute. M.D.V. 
is supported by a pre-doctoral fellowship from the Cancer Research Institute. J.P.A. is 
supported by the Howard Hughes Medical Institute and the Ludwig Center for Cancer 
Immunotherapy; M.J.S. by the National Health and Medical Research Council of 
Australia (NH&MRC) and from the Association for International Cancer Research; and 
LJ.O. by the Ludwig Institute for Cancer Research and the Cancer Research Institute. 


Author Contributions H.M. and M.D.V. were involved in all aspects of this study 
including planning and performing experiments, analysing and interpreting data, and 
writing the manuscript. C.G.R., R.U., C.D.A.,, J.M.W., Y.-S.C. and LK.S. also performed 
experiments and analysed data. V.J.M., R.D. and members of The Genome Institute 
performed Illumina library preparation, cDNA capture and sequencing as well as 
validation Roche/454 pyrosequencing and 3730 sequencing. D.C.K. analysed and 
interpreted sequencing data from this study and previously published cancer genome 
data. J.H. and T.W. analysed cDNA CapSeq data for potential MHC class | epitopes. 
M.C.W. performed the phylogenetic analysis on the tumour cells. J.P.A., M.J.S. and LJ.O. 
interpreted data and contributed to the preparation of the final manuscript. E.R.M. and 
R.D.S oversaw all the work performed, planned experiments, interpreted data and 
wrote the manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to R.D.S. (schreiber@immunology.wustl.edu). 


©2012 Macmillan Publishers Limited. All rights reserved 


METHODS 


Mice. Ifngrl ’~ mice” and Ifnarl /~ mice” ona 129/Sv background were originally 
provided by M. Aguet and were bred in our specific pathogen-free animal facility. 
Wild-type and Rag2! ~ mice were purchased from Taconic Farms. All mice were 
male and on a 129/Sv background and were housed in our specific pathogen-free 
animal facility. For all experiments, male mice were 8-12 weeks of age and studies 
were performed in accordance with procedures approved by the AAALAC 
accredited Animal Studies Committee of Washington University in St. Louis. 
Tumour transplantation. MCA-induced sarcomas used in this study were 
generated in male 129/Sv strain wild-type or Rag2'~ mice and banked as low- 
passage tumour cells as previously described’. Tumour cells derived from frozen 
stocks were propagated in vitro in RPMI media (Hyclone) supplemented with 10% 
FCS (Hyclone) and injected subcutaneously in 150 pl of endotoxin-free PBS into 
the flanks of recipient mice. Tumour cells were >90% viable at the time of injec- 
tion as assessed by trypan blue exclusion and tumour size was quantified as the 
average of two perpendicular diameters. For antibody depletion studies, 250 pg of 
control IgG (PIP), anti-CD4 (GK1.5) or anti-CD8% (YTS169.4) were injected 
intraperitoneally into mice at day —1 and every 7 days thereafter. 

Isolation of normal skin fibroblasts. Skin fibroblasts were isolated from three 
independent male 129/Sv Rag2 ‘~ pups by harvesting skin and incubating in 
0.25% trypsin (Hyclone) at 37°C for 30 min before washing in DMEM media 
(Hyclone). After washing, chunks of skin were filtered to achieve single-cell sus- 
pensions and cultured in vitro with DMEM media. After three passages, skin 
fibroblasts were harvested to isolate genomic DNA and total RNA. 

Extraction of genomic or complementary DNA. Genomic DNA from sarcoma 
cells and normal skin fibroblasts was extracted using DNeasy Blood & Tissue Kit 
(Qiagen). For cDNA isolation, total RNA from sarcoma cells and normal skin 
fibroblasts was isolated using RNeasy Mini kit (Qiagen) and cDNA was synthesized 
using oligo (dT) primers and SuperScript II Reverse Transcriptase (Invitrogen). 
cDNA CapSeq. cDNA samples from each tumour (100 ng) were constructed into 
Illumina libraries according to the manufacturer’s protocol (Illumina) with the 
following modifications. First, cDNA was fragmented using Covaris $2 DNA 
Sonicator (Covaris) in 1X end-repair buffer followed by the direct addition of 
the enzyme repair cocktail (Lucigen). Fragment sizes ranged between 100-500 bp. 
Second, Illumina adaptor-ligated DNA was amplified in four 50 pl PCRs for five 
cycles using 411 adaptor-ligated cDNA, 2X Phusion Master Mix and 250nM 
forward and reverse primers, 5'-AATGATACGGCGACCACCGAGATCTAC 
ACTCTTTCCCTACACGACGCTCTTCCGATC and 5'- CAAGCAGAAGACG 
GCATACGAGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC, respec- 
tively. Third, Solid Phase Reversible Immobilization (SPRI) bead cleanup was used 
to purify the PCR-amplified library and to select for 300-500 bp fragments. Five- 
hundred nanograms of the size-fractionated Illumina library were hybridized with 
the Agilent mouse exome reagent. After hybridization at 65 °C for 24h, we added 
50 pl of DynaBeads M-270 streptavidin-coated paramagnetic beads (10 mg ml’) 
to selectively remove the biotinylated Agilent probes and hybridized cDNA library 
fragments. The beads were washed according to manufacturer’s protocol (Agilent) 
and the captured library fragments were released into solution using 50 ul of 
0.125N NaOH and neutralized with an equal volume of neutralization buffer 
(Agilent). The recovered fragments then were PCR amplified according to the 
manufacturer’s protocol using 11 cycles in the PCR. Illumina library quantification 
was completed using the KAPA SYBR FAST qPCR Kit (KAPA Biosystems). The 
qPCR result was used to determine the quantity of library necessary to produce 
180,000 clusters on a single lane of the Iumina GAIIx. One lane of 100 bp paired- 
end data was generated for each captured sample (as cDNA was used as the source 
for sequencing, we refer to this process as CDNA Capture Sequencing or cDNA 
CapSeq). Illumina reads were aligned to the NCBI build 37 (Mm9) mouse ref- 
erence sequence using BWA” v.0.5.5 (with —q 5 soft trimming). Alignments from 
multiple lanes for the same sample were merged together using SAMtools r599, 
and duplicates were marked using Picard v.1.29. 

Mutation detection and annotation. Putative somatic mutations were identified 
using VarScan 2 (v.2.2.4)°! with the parameters “-min-coverage 3-min-var- 
freq 0.08-p-value 0.10-somatic-p-value 0.05-strand-filter 1’ and specifying a 
minimum mapping quality of 10. Variants whose supporting reads exhibited read 
position bias (average read position <10 or >90), strand bias (>99% of reads on 
one strand), or mapping quality (score difference >30, or mismatch quality sum 
difference >100) relative to reference supporting reads were removed as probable 
false positives. We also required that the variant allele be present in at least 10% of 
tumour reads and no more than 5% of normal reads. The SNVs meeting these 
criteria were annotated using an internal database of GenBank/Ensembl tran- 
scripts (v58_73k). In the event that a variant was annotated using multiple tran- 
scripts, the annotation of most severe effect was used. Non-silent coding mutations 
(missense, nonsense/nonstop or splice site) were prioritized for downstream 
analysis. 
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Mutation rate and overlap comparisons. Mutation rates were estimated for each 
tumour sample using the number of putative ‘tier 1’ SNVs (missense, nonsense/ 
nonstop, splice site, silent or noncoding RNA). To account for variability in 
coverage between samples, the SNV count for each tumour sample (S) was divided 
by a coverage factor (F), computed as the fraction of all tier 1 SNVs identified in 
any tumour sample (1 = 16,991) that were covered by at least four reads in a given 
sample. For example, in the d42m1 parental sample, 15,852 of 16,991 tier 1 SNV 
positions were covered, for a coverage factor of 93.30%. The number of coverage- 
adjusted mutations in each sample was divided by the total size of tier 1 space in the 
mouse genome (43.884 Mbp) to determine the number of coding mutations per 
megabase (R). 


R= (S/F)/ (43.884 Mbp) 


For the mutation overlap comparisons and relatedness-to-parental-tumour ana- 
lysis, only high-confidence missense mutations were used (that is, 20 or above). 
A mutation was considered ‘shared’ between two samples if both samples had a 
predicted mutation at the same genomic position. For the comparison of mutated 
genes between d42m1 and H31m1 parental lines, a gene was considered ‘shared’ if 
both d42m1 and H31m1 samples had a predicted missense mutation in that gene, 
even if the mutations did not occur at the same position. 

Roche/454 sequencing and validation. PCR primers were designed for 11 SNVs 
predicted to be somatic in d42m1 tumour samples, as well as 11 control sites that 
were H31m1-specific, low-confidence, or removed by the false-positive filter. All 
22 SNVs were PCR amplified individually in 11 samples (SK1.1, d42m1, H31m1, 
T2, T3, T5, T9, T10, es1, es2 and es3) using MID-tailed primers to enable sample 
identification. PCR products were pooled together before sequencing on a quarter 
run of the Roche/454 Titanium platform. Read sequences and quality scores were 
extracted from 454 data files using sffinfo (454 proprietary software) then aligned 
to the mouse build 37 reference sequence (Mm19) using SSAHA2 v.2.5.3° with 
the SAM output option. Alignments were imported to BAM format and a ‘pileup’ 
assembly file generated using SAMtools yv.0.1.18°°. The average 454 sequence 
depth for targeted positions was 1,216X per sample. Validation read counts and 
allele frequencies in each sample at each variant position were determined using 
the pileup2cns command of VarScan v.2.2.7°'. At least 20 reads with base quality of 
20 or higher were required to confirm or refute a variant. 454 sequencing data and 
the primers used are presented in Supplementary Table 4. 

3730 sequencing and validation. Eight SNVs predicted to be somatic were 
selected for validation by PCR and 3730 sequencing in flow-sorted CD45* and 
CD45° cells from the original d42m1 tumour. Genomic DNA and cDNA from 
CD45~ (tumour) cells, and cDNA from CD45~ (normal immune) cells were used 
for PCR amplification and then PCR products were sequenced individually on ABI 
3730 using universal primers. Manual review was performed using amplicon- 
based assembly in the Integrative Genomics Viewer (IGV)** to determine the 
somatic status for each site. Data are presented in Supplementary Table 4. 
MHC class I epitope prediction. All missense mutations for each d42m1-related 
tumour or H31m1] were analysed for the potential to form MHC class I neoepitopes 
that bind to either H-2D° or H-2K° molecules. The artificial neural network (ANN) 
algorithm provided by the Immune Epitope Database and Analysis Resource 
(http://www.immuneepitope.org) was used to predict epitope binding affinities” 
and the results were ultimately expressed as affinity values (1/ICs9 X 100). Predicted 
strong affinity epitopes expressed in d42m1 regressor tumours are listed in 
Supplementary Table 5. 

Phylogenetic analysis of tumour samples. Sequencing data from normal 
Rag2 ’ ~ fibroblasts, d42m1 parental cells, d42m1 regressor clones, d42m1 pro- 
gressor clones, d42m1 escape tumours and H31m1 tumour cells were compared 
using PHYLogeny Inference Package (PHYLIP)* to generate a phylogenetic tree 
displaying the relatedness of each sample. 

Antibodies. Anti-H-2K? (B8-24-3) and anti-H-2D” (B22/249) monoclonal 
antibodies were provided by T. H. Hansen (Washington University School of 
Medicine). Anti-CD4 (GK1.5), anti-CD8« (YTS169.4) monoclonal antibodies 
and control immunoglobulin (PIP, a monoclonal antibody specific for bacterial 
glutathione S-transferase) were produced from hybridoma supernatants and puri- 
fied in endotoxin-free form by Protein G affinity chromatography (Leinco 
Technologies). Purified Rat IgG was purchased from Sigma (St. Louis). CD45- 
FITC, CD45-PE, CD8-APC and purified anti-CD16/32 were purchased from 
BioLegend. 

cDNA library construction and screening. To generate a d42m1 tumour cell 
cDNA library, mRNA was isolated from parental d42m1 tumour cells using a 
QuickPrep mRNA Purification kit (Amersham), converted into cDNA using 
SuperScript II First Strand Synthesis System (Invitrogen) and inserted into the 
EcoRI site of the expression vector pcDNA3 (Invitrogen). The cDNA library was 
divided into pools of 100 bacterial colonies with 200-300 ng of DNA from each 
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pool transfected into 2.5 X 10* monkey COS cells engineered to ectopically express 
mouse H-2D> (COS-D°) cells using Lipofectamine 2000. After 48h, 5 X 10° C3 
CTL cells were added, and supernatants were assayed for IFN-y release 24h later 
by ELISA. A single positive cDNA clone was isolated after screening 120,000 
cDNA colonies. The putative H-2D?-binding peptide VAVVNQIAL was pre- 
dicted using the algorithm available at the Immune Epitope Database and 
Analysis Resource, http://www.immuneeptiope.org/. The peptides were produced 
by P. Allen and S. Horvath (Washington University School of Medicine). 
Expression vectors. Full-length cDNA encoding wild-type spectrin-B2 and 
mutant spectrin-B2 were cloned from parental d42m1 tumour cells by RT-PCR 
using primer pairs 5’-TGAGACAGTCAAGATGACGACCACGGTAGCCACA- 
3' and 5'-CGGGACAACAGGGAAGTTCACTTCTTCTTGCCGA-3’. Wild- 
type and mutant spectrin-B2 cDNA were subcloned from the TOPO-XL vector 
(Invitrogen) into the retrovirus (RV)-GFP vector*’. To generate the RV-RFP 
vector, full-length cDNA encoding RFP was cloned from the pTurboRFP-C vector 
(Evrogen) by RT-PCR using primer pairs 5’- ATCTCAGAATTCATGAGC 
GAGCTGATCAAGGA-3' and 5’-ATCTCAGGATCCTTATCTGTGCCCCA 
GTTTGCTAG-3’. RFP cDNA was then cloned into the RV vector. To remove 
candidate T-cell epitopes in RFP, the nucleotide A was replaced by G at position 
334 in the cDNA, resulting in amino acid substitution N112D. Coding sequences 
of the constructs were verified by DNA sequencing (Big Dye method; Applied 
Biosciences). The dominant-negative version of the IFNGRI subunit 
(IFNGRIAIC) was expressed into H31m1 and d42m1 tumour cells as previously 
described”. 

Establishment of CTL lines and clones. To generate the d42m1-specific C3 CTL 
clone, wild-type mice were injected with 1 X 10° parental d42m1 tumour cells. 
Fourteen days later, the spleen was harvested from a mouse that rejected the 
tumour and a CTL line was established by stimulating 40 X 10° splenocytes with 
2 X 10° parental d42m1 tumour cells pre-treated for 48h with 100Uml' of 
recombinant murine IFN-y and irradiated (100 Gy). After CD8* T-cell purifica- 
tion using magnetic beads (Miltenyi Biotec) and limiting dilution, the CTL clone 
C3 was obtained. 

Measurement of IFN-y production. To generate target cells, tumour cells were 
treated with 100 U ml! IFN-y for 48 h and irradiated with 100 Gy before use. The 
C3 CTL clone was co-cultured at the indicated ratios with target tumour cells 
(10,000 or 5,000 cells) in 96-well round-bottomed plates overnight. IFN-y in 
supernatants was quantified using an IFN-y ELISA kit (eBioscience). For blocking 
assays, 10 ug ml ! of anti-CD8 (YTS-169.4), anti-CD4 (GK1.5) or control 
immunoglobulin (PIP) were added to the cell culture of effector (C3 CTL clone) 
and target cells (tumours). 

Cytotoxicity assay. To generate target cells, tumour cells were treated with 
100 U ml! rMulFN- for 48 h before use. One million tumour cells were labelled 
with 25 pCi of Na,°!CrO4 (PerkinElmer) for 90 min at 37 °C, washed and 10,000 
cells seeded per well in 96-well round-bottom plates. The C3 CTL clone was co- 
cultured with the tumour target cells at the indicated effector/target cell ratios and 
incubated for 4h at 37°C in 5% COp. Radioactivity was detected in the super- 
natants and per cent specific killing was defined as (experimental condition 
c.p.m.— spontaneous c.p.m.) / (maximal (detergent) c.p.m.— spontaneous 
c.p.m.) X 100. Data points were obtained in duplicate. 

Fluorescence-activated cell sorting analysis. For flow cytometry, cells were 
stained for 20 min at 4°C with 500 ng of Fc block (anti-CD16/32) and 200 ng of 
CD45, CD4 or CD8« in 100 ul of staining buffer (PBS with 1% FCS and 0.05% 
NaN; (Sigma)). Propidium iodide (PI) (Sigma) was added at 1 pg ml ! immediately 
before FACS analysis. For quantitative analysis of tumour-infiltrating lymphocytes/ 
leukocytes (TIL) and lymph node populations, a CD45" PI” gate was used and gated 
events were collected on a FACSCalibur (BD Biosciences) and analysed using FloJo 
software. 


Tumour, draining lymph node and spleen harvest. After tumour cell trans- 
plantation, established tumours were excised from mice, minced and treated with 
lmg ml | type IA collagenase (Sigma) in HBSS (Hyclone) for 2h at room tem- 
perature (22 °C). The ipsilateral inguinal tumour draining lymph nodes and spleen 
were also harvested and crushed between two glass slides and vigorously resus- 
pended to make single-cell suspensions. 

Tetramers. H-2D” tetramers conjugated to PE were prepared with mutant spec- 
trin-B2 peptides and produced by the NIH Tetramer Core Facility (Emory 
University). 

Mutation-specific RT-PCR and real-time RT-PCR. Total RNA from tumour 
cells was isolated by RNeasy Mini kit (Qiagen) and cDNA was synthesized from 
the total RNA using oligo (dT) primers and SuperScript II Reverse Transcriptase 
(Invitrogen). Real-time PCR specific for wild-type spectrin-B2, mutant spectrin-B2 
and GAPDH using the SYBR Green Mastermix kit (Applied Biosystems) were 
performed on ABI 7000. The primer sequences for used for mutant spectrin-B2 
are 5’-GGTGAACCAGATTGCACT-3’ and 5'-TGTCCACCAGTTCTCTGAACT-3’. 
Detection of mutation in spectrin-B2 cDNA. The point mutation in the 
spectrin-B2 gene creates a PstI restriction site (CGGCAG to CTGCAG, underlined 
letters indicate the site of mutation). To amplify spectrin-B2 cDNA we used a 
forward primer (ACCCTGGCCCTGTACAAGAT) and_ reverse primer 
(TAGACTCGATGACCTTGGTCT). The PCR conditions used were 94°C for 
2 min, followed by 35 cycles of 94°C for 30s, 55°C for 30s and 72°C for 30s. 
The PCR products were digested for 2h at 37 °C with PstI restriction enzyme, 
which cleaved mutant spectrin-[2, but not wild-type spectrin-B2, and generates a 
200 bp fragment from cDNA. The products were resolved by electrophoresis on a 
1.2% agarose gel and visualized by ethidium bromide staining. 

Isolation of non-transformed cells from d42m1 biopsy. A frozen d42m1 
tumour biopsy from the original d42m1 tumour was thawed and treated with 
1 mg ml ' type IA collagenase (Sigma) in HBSS for 2 h at room temperature. After 
filtration, single-cell suspensions were stained for 20 min at 4°C with 500 ng of Fc 
block (anti-CD16/32) and 200ng of CD45-PE in 100 of staining buffer. 
Propidium iodide was added at 1pgml-' immediately before sorting. A 
CD45* PI” gate was used and the top 15% and the bottom 15% of gated events 
were collected using a FACSAria II (BD Biosciences). Sorted CD45° cells (host 
leukocytes) and CD45" cells (primary d42m1 tumour cells) were collected and 
genomic DNA as well as RNA was isolated to synthesize cDNA for 3730 sequen- 
cing to validate that the mutation calls detected by Illumina were somatic and 
tumour specific. 

Statistical analysis. Samples were compared using an unpaired, two-tailed 
Student’s t-test, unless specified. 
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Expression of tumour-specific antigens underlies 


cancer immunoediting 


Michel DuPage’, Claire Mazumdar’, Leah M. Schmidt', Ann F. Cheung! & Tyler Jacks! 


Cancer immunoediting is a process by which immune cells, par- 
ticularly lymphocytes of the adaptive immune system, protect the 
host from the development of cancer and alter tumour progression 
by driving the outgrowth of tumour cells with decreased sensitivity 
to immune attack’. Carcinogen-induced mouse models of cancer 
have shown that primary tumour susceptibility is thereby enhanced 
in immune-compromised mice, whereas the capacity for such 
tumours to grow after transplantation into wild-type mice is 
reduced**. However, many questions about the process of cancer 
immunoediting remain unanswered, in part because of the known 
antigenic complexity and heterogeneity of carcinogen-induced 
tumours’. Here we adapted a genetically engineered, autochthonous 
mouse model of sarcomagenesis to investigate the process of cancer 
immunoediting. This system allows us to monitor the onset and 
growth of immunogenic and non-immunogenic tumours induced 
in situ that harbour identical genetic and histopathological charac- 
teristics. By comparing the development of such tumours in 
immune-competent mice with their development in mice with 
broad immunodeficiency or specific antigenic tolerance, we show 
that recognition of tumour-specific antigens by lymphocytes is 
critical for immunoediting against sarcomas. Furthermore, 
primary sarcomas were edited to become less immunogenic 
through the selective outgrowth of cells that were able to escape T 
lymphocyte attack. Loss of tumour antigen expression or presenta- 
tion on major histocompatibility complex I was necessary and suf- 
ficient for this immunoediting process to occur. These results 
highlight the importance of tumour-specific-antigen expression 
in immune surveillance, and potentially, immunotherapy. 

To determine whether T lymphocytes influence tumour develop- 
ment, we adapted a mouse model of human soft tissue sarcoma- 
genesis driven by Cre/LoxP-regulated expression of oncogenic 
K-ras“!”P and deletion of p53 to allow for the control of tumour 
immunogenicity’. Sarcomas were induced in either immune- 
competent Kras’S!-@12P/* 53"-Rag2*/— (KP) or lymphocyte- 
deficient Kras'*'"-@127” + psi ;Rag2 ‘~ (KPR) mice by intramuscular 
injection of lentiviral vectors that expressed Cre recombinase alone 
(Lenti-x). To induce sarcomas with potentially immunogenic antigens, 
we used vectors that also expressed the T-cell antigens SIYRYYGL 
(SIY) and two antigens from ovalbumin (SIINFEKL (SIN, OVAj57_ 
264) and OVA393-339) fused to the carboxy terminus of luciferase 
(Lenti-LucOS). Intramuscular injection of Lenti-LucOS led to tumour 
formation in 100% of KPR mice but only 27% of KP mice by 140 days 
(Fig. la, P<0.0001). Additional sarcomas ultimately developed in 
KP mice but with dramatically delayed kinetics (latency of 
194.8 + 43.4days) compared with KPR mice (73.6+4.3 days) 
(Fig. 1c, P< 0.02). We also observed a difference in the penetrance 
of sarcoma development in KPR versus KP mice by 140 days with 
Lenti-x (89% versus 43%, respectively), although the difference was 
less dramatic than observed with Lenti-LucOS (Fig. 1b, P< 0.0005). 
This suggests that in this model, tumour immunosurveillance may not 
necessitate the introduction of highly immunogenic tumour-specific 


antigens (TSAs). The observed immunosurveillance against Lenti-x 
tumours could result from the lentiviral infection required to induce 
tumours, the acquisition of TSAs during tumour development, or 
the immunogenicity of Cre itself. However, in a previous study, we 
found that Cre was not highly immunogenic when expressed in devel- 
oping lung adenocarcinomas®. Although Lenti-x-induced sarcoma 
development was slightly delayed in immune-competent (KP) mice 
(114.9 days in KP versus 79.5 days in KPR mice), it was not significant 
(Fig. 1c, P= 0.11). The increased latency that is specific to Lenti- 
LucOS tumours may be the result of an equilibrium between 
replicating tumour cells and T cells that recognize antigens expressed 
from the LucOS vector and restrain tumour progression”. 

Rag2 (recombination activating gene 2) deficiency prevents both T 
and B lymphocyte development and, therefore, could have pleiotropic 
effects on the immune response to tumour antigens. To specifically test 
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Figure 1 | Sarcoma formation in immunodeficient compared to wild-type 
mice occurs with increased penetrance and reduced latency. a, b, KPR or KP 
mice were injected intramuscularly with Lenti-LucOS (a) or Lenti-x (b) and the 
onset of palpable sarcomas was monitored. c, Time for palpable tumour 
formation with Lenti-LucOS or Lenti-x in KPR (circles) or KP (triangles) mice. 
d, Sarcoma formation in KP mice either untreated or treated with anti-CD4 and 
anti-CD8 antibodies beginning coincident with or 14 days after Lenti-LucOS 
injection. e, Sarcoma onset after injection of KP-LSIY or KP littermates with 
Lenti-LucS. The percentage of total mice (1) with sarcomas by 140 days (grey 
boxes) is indicated. 
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the significance of T-cell responses, we treated mice with antibodies 
against CD4 and CD8 to deplete T cells concurrent with, or subsequent 
to, intramuscular injection of Lenti-LucOS. T-cell depletion at tumour 
initiation, or even 14 days after tumour initiation, led to sarcoma 
development with complete penetrance and early onset similar to 
KPR mice (Fig. 1d, P= 0.001 and P = 0.013 compared to untreated, 
respectively). To specifically test the importance of CD8™ T cells that 
recognize the model TSAs, we made use of a regulatable luciferase-SIY 
fusion gene engineered into the murine Rosa26 locus (R26'°""*!")8, 
These mice develop specific tolerance to luciferase and SITY due to weak 
thymic expression and deletion of reactive T cells (Supplementary Fig. 1)°. 
Kras' 62D 953M. R26'S!-1S1/+ (KP-LSTY) mice injected with Lenti- 
LucS, a lenti-vector that expresses Cre and SITY fused to luciferase, were 
more susceptible to sarcoma formation and developed tumours earlier 
than KP littermates (Fig. le, P = 0.058). Thus, lymphocyte-mediated 
protection from sarcoma formation requires CD8* T cells that 
respond to non-self antigens expressed in tumours. 

A key advantage of this conditional, genetically engineered cancer 
model over carcinogen-induced models is the capacity to track endo- 
genous T cells specific for tumour antigens during primary tumour 
development. We used STY and SIN loaded MHCI/K’ reagents to track 
tumour-reactive CD8* T cells by flow cytometry. Only mice with 
Lenti-LucOS sarcomas harboured CD8* T cells specific to SIY and 
SIN in the lymph nodes nearest the tumour site as well as in the spleen 
(Fig. 2a, b). These CD8™ T cells appeared to be completely functional 
because they produced both IFN-y and TNF-« upon stimulation 


(Fig. 2a—d). Interestingly, this contrasts sharply with results from an 
analogous model of lung adenocarcinoma in which the activity of T 
cells responding to the same tumour antigens was very weak, suggest- 
ing that different tumour types may use different mechanisms to 
escape immune attack®. We also investigated whether KP mice that 
did not develop sarcomas after injection with Lenti-LucOS harboured 
antigen-specific T cells, because such T cells could have protected these 
mice from sarcoma development. Indeed, we detected fully functional 
antigen-specific T cells in these mice (Fig. 2c, d and Supplementary 
Fig. 1), demonstrating that T cells specific to these model TSAs are 
functional and probably provide significant protection against the 
development of Lenti-LucOS sarcomas. 

Pivotal experiments using methylcholanthrene (MCA)-induced 
sarcomas revealed that tumours generated in immune-compromised 
mice, and thus not immunoedited, are more susceptible to rejection 
upon transplantation into immune-competent mice’. To assay whether 
autochthonous sarcomas driven by targeted genetic mutations would 
also display an unedited phenotype, we transplanted independently 
derived sarcomas from KPR or KP mice into either wild-type or 
Rag2’~ mice. Whereas freshly isolated Lenti-LucOS-induced tumours 
(or cell lines) generated in KP mice grew similarly upon transplantation 
into either wild-type or Rag2-’~ mice, most Lenti-LucOS tumours 
generated in KPR mice were rejected (1/7) or had significantly delayed 
growth (4/7) (Fig. 3a, b and Supplementary Fig. 2). These results 
recapitulate the original findings from carcinogen-induced sarcomas 
in a genetically engineered mouse model of sarcomagenesis. 
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Figure 2 | Functional T-cell responses are generated against antigens 
expressed in sarcomas. a, Top: percentage of gated CD8° cells specific for SIY 
and SIN in the inguinal lymph nodes either draining (DLN) or peripheral to 
(PLN) Lenti-x or Lenti-LucOS tumours. Bottom: IFN-y and TNF-« cytokine 
production in SIY+SIN-stimulated CD8* T cells from mice analysed above. 
b, Analysis of splenocytes as in a. c, d, Cumulative data depicting the percentage 
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of SIY- and SIN-specific T cells that were IFN-y~ TNF-o* from lymph nodes 
(c) or spleens (d) of KP mice infected with Lenti-LucOS that developed a 
‘sarcoma’ or were ‘tumour-free’ at 170 days. T cells reactive to SIY were 
analysed four months after challenge with WSN-SIY (influenza strain 
expressing SIY). Data represent analysis of 3-4 mice per group, mean + s.e.m. 
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Next we wanted to determine whether Lenti-x-induced sarcomas, 
which lack the strong T-cell antigens from LucOS, would yield similar 
results. Interestingly, Lenti-x tumours generated in KPR or KP mice 
grew equally well when transplanted into wild-type or Rag2-’~ mice 
(Fig. 3c, d). It is noteworthy that while autochthonous tumours initiated 
by Lenti-x appeared partially inhibited by an adaptive immune response 
(Fig. 1b), in the context of transplantation, we found no evidence of 
immunoediting (Fig. 3c). This difference may be due to Rag-dependent 
innate immune cells (NKT and y6 T cells) that recognize stress or 
inflammatory ligands. These cells may be sufficient to eliminate a 
limited number of nascent tumour cells in the context of transforma- 
tion by lentiviral infection, but not in response to the transplantation of 
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Figure 3 | Cancer immunoediting phenotypes require the presence of 
potent T-cell antigens. Transplanted tumour growth of Lenti-LucOS-induced 
sarcomas generated in KPR (a) or KP (b) mice and Lenti-x-induced sarcomas 
generated in KPR (c) or KP (d) mice. Left column, representative tumour 
growth curves from two different primary tumours (coloured red or blue) after 
transplantation into Raga" (dashed lines) or wild-type (WT, solid lines) mice. 
Right column, comparison of the mean tumour volume + s.e.m. for all tumours 
transplanted. ® indicates no detectable mass. See Supplementary Fig. 2 for 
growth curves of tumour lines. 
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fully developed tumours'”. Nevertheless, we suggest that Lenti-x 
sarcomas from KPR mice grew unabated after transplantation into 
KP mice because immunoediting by T lymphocytes requires potent 
TSAs, which these tumours lack. The observed immunogenicity of 
carcinogen-induced sarcomas generated in immune-compromised 
mice may be due to the de novo generation of potent tumour 
neoantigens during transformation with mutagens’’. Importantly, 
in another study reported in this issue’’, somatically mutated 
spectrin-$2 in a MCA-induced sarcoma was found to act as a potent 
neoantigen that drove the immunoediting process. In an attempt to 
introduce immunogenic mutations in Lenti-x tumours, we treated cell 
lines from these tumours with MCA in vitro. Interestingly, such treat- 
ment rarely yielded clones with increased immunogenicity (Sup- 
plementary Fig. 3). This may indicate that although carcinogens can 
produce mutations that are immunogenic, it may be a rare event. 

If cancer immunoediting by lymphocytes requires potent TSAs, 
then Lenti-LucOS-induced tumours that appear edited after forming 
in KP mice may have evaded an immune response by the selective 
outgrowth of cells lacking these potent antigens'*""°. To assess antigen 
expression, we measured luciferase activity in tumours. Whereas 
tumours from KPR mice were universally luciferase positive, tumours 
from KP mice had drastically reduced luciferase activity in all but one 
of six sarcomas (Fig. 4a, b). Interestingly, this sarcoma had significantly 
reduced expression of H-2K°, the MHC class I allele responsible for 
presenting the STY and SIN antigens (Fig. 4c). Sarcomas from KP mice 
treated with anti-CD4 and anti-CD8 antibodies at tumour initiation 
also retained luciferase activity (5/6 sarcomas luc’, Fig. 4a). However, 
fewer sarcomas retained luciferase expression when mice were treated 
with anti-CD4 and anti-CD8 antibodies beginning 14 days after 
tumour initiation (1/5 sarcomas luc ), suggesting that immunoediting 
can occur very early during sarcoma development. Thus, by selectively 
eliminating cells that express potent TSAs, T lymphocytes drive the 
escape of tumour cells that either do not express potent antigens or 
cannot present the antigens to reactive T cells. 

In a similar fashion to the antigen loss observed in autochthonous 
sarcomas, Lenti-LucOS-induced sarcomas from KPR mice lost antigen 
expression when transplanted into wild-type mice (Supplementary 
Fig. 4). Importantly, tumours that lost antigen expression after being 
passaged through wild-type mice grew comparably upon secondary 
transplantation into wild-type and Rag2 ‘~ mice, whereas tumours 
passaged through Rag2 ’~ mice did not (Supplementary Fig. 4). To 
test whether antigen loss was sufficient to provide a means of escape for 
Lenti-LucOS sarcomas generated in KP mice, we reintroduced the 
LucOS antigens into sarcomas that had lost expression of the antigens 
after passage through wild-type mice (referred to as antigen” 
tumours). Indeed, re-expression of LucOS led to severely reduced 
tumour growth (Fig. 4d), indicating that loss of antigen expression 
was the primary means of tumour escape in this setting. 

Epigenetic silencing of tumour antigen expression via DNA methy- 
lation could be responsible for antigen loss and tumour escape’”"*. To 
test this hypothesis, we treated cell lines that had lost luciferase 
expression after transplantation into immune-competent mice with 
5-aza-2'-deoxycytidine (Aza), which reverses epigenetic silencing by 
inhibiting DNA methylation. In several lines tested, luciferase activity 
was restored with Aza treatment (Fig. 4e). Therefore, epigenetic silen- 
cing of tumour antigens may represent an important mechanism by 
which tumours can be edited in response to immune surveillance. 

Here we have overcome many of the obstacles of carcinogen- 
induced models of cancer by using an autochthonous, genetically 
engineered model of sarcomagenesis to show that T lymphocyte- 
driven tumour antigen loss isa critical means by which cancer immuno- 
editing occurs in a primary tumour setting. Although this study was 
limited to investigating the role of anti-tumour immunity by T cells, 
this model could be adapted to investigate the role of other critical 
immune cells in cancer immunoediting, such as B cells or NK cells, 
by either introducing surface-expressed or stress-related antigens into 
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Figure 4 | Immunoediting occurs by selecting for tumour cells that do not 
express targeted antigens. a, Representative luciferase activity of Lenti-LucOS 
and Lenti-x-induced sarcomas in KPR, KP or anti-CD4/CD8 treated KP mice. 
b, Luciferase expression in Lenti-LucOS-induced sarcoma cell lines generated 
in KPR or KP mice. ¢, Freshly harvested sarcomas cultured with IFN-y (solid 
line) or untreated (dashed line) were analysed for H-2K° surface expression 
(shaded, control antibody). d, Growth of two independent tumours (3919 and 
4070) that had lost antigen expression (antigen , blue lines) or the same 


tumours, respectively’ *'. This study resulted in two key discoveries. 
First, oncogene-driven, endogenous tumours can undergo immunoedit- 
ing in a manner similar to carcinogen-driven tumours if engineered to 
express model TSAs. The immunogenicity of MCA-induced sarcomas is 
well-documented, and may be a direct consequence of TSAs that arise 
from carcinogen-induced mutations of normal genes during tumour 
development”''’. In contrast, cancers that arise spontaneously or by 
targeted genetic mutations in mice have been reported to be weakly 
immunogenic” **. However, the mutational requirements for tumori- 
genesis in humans may be greater than in mice’®, and thus it is possible 
that spontaneous or genetically engineered mouse models of cancer 
might underestimate the mutational and antigenic load of most human 
cancers. This idea is supported by the second critical finding of this 
study—that tumour immunogenicity is not a universal characteristic 
of cancer development. By obviating the need for carcinogens, we could 
induce sarcomas that potentially lacked potent TSAs. These tumours 
had significantly reduced immunogenicity despite no previous engage- 
ment with the adaptive immune system and hence no opportunity for 
immunoediting. These results provide the first (to our knowledge) 
experimental system to unify the apparently conflicting results obtained 
using either carcinogen-induced or genetically targeted mouse models of 
cancer by identifying TSAs as the critical determinants that invoke 
adaptive immunosurveillance and immunoediting”””. We propose that 
identifying and characterizing TSAs in human cancers may be critical 
for the generation of more effective anti-cancer immunotherapies in 
patients suffering from this disease. 
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tumour lines after reintroduction of LucOS (antigen -LucOS, red lines). Mean 
tumour volume + s.e.m. after transplantation into three wild-type mice (solid 
lines) or one Raga” mouse (dashed lines). e, Relative luciferase activity 
(compared to the primary sarcoma) + 5-aza-2'-deoxycytidine (Aza) of Lenti- 
LucOS sarcomas from KPR mice (Primary, black columns) that were passaged 
through Rag2~/~ (Passaged through Rag™”, grey columns) or wild-type mice 
(Passaged through Rag”, white columns). Mean + s.e.m. from two 
experiments. 


METHODS SUMMARY 


Experiments used mice of the 129S,/SvJae strain. All animal studies and procedures 
were approved by the Massachusetts Institute of Technology’s Committee for Animal 
Care. Sarcomas were induced in KP and KPR mice by intramuscular injection of the 
hind limb with replication-incompetent lentiviruses expressing Cre recombinase as 
reported previously*®. To deplete T cells, anti-CD4 (GK1.5) and anti-CD8 
(YTS169.4) antibodies were administered at a dose of 250 1g per mouse by ip. 
injection once weekly for the duration of the experiment. Flow cytometry was per- 
formed as described’. For transplantation experiments, 2X 10° freshly isolated 
tumour cells or cultured tumour cells were transplanted subcutaneously into 
immune-competent or Rag2 ‘~ mice of the 129S,/SvJae background. Tumour 
volumes were calculated by multiplying the length < width x height of each tumour. 
To detect luciferase activity, freshly explanted tumours or cell lines were lysed, mixed 
with Luciferin reagent (Promega), and relative light units (RLU) were detected with a 
luminometer (MGM Instruments). Aza treatment used 1 1M 5-aza-2'-deoxycytidine 
for three days. In vivo bioluminescence images were acquired with the NightOWLII 
LB983 (Berthold Technologies) or the IVIS Spectrum (Xenogen) after intraperito- 
neal injection of 1.5 mg beetle luciferin (Promega). Statistical analyses used unpaired 
two-tailed Fisher exact probability tests or Student’s f-tests. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Mice and tumour induction. 129S,/SvJae strains backcrossed 8 generations were 
used for all experiments. Trp53! mice were provided by A. Berns, Kras'“'12? 
were generated in our laboratory, and Rag2’~ mice were purchased from The 
Jackson Laboratory. Sarcomas were induced in KP and KPR mice by intramuscular 
injection of the left hind limb with replication-incompetent lentiviruses expressing 
Cre recombinase as reported previously”*. Mice were monitored twice weekly for 
palpable sarcoma formation beginning 50 days after intramuscular injection. All 
animal studies and procedures were approved by the Massachusetts Institute of 
Technology’s Committee for Animal Care. 

Lentiviral production. Lentivirus was produced by transfection of 293T cells with 
A8.2 (gag/pol), CMV-VSV-G, and the various transfer vectors expressing Cre as 
described”’. 

Antibody depletion. Anti-CD4 (GK1.5) and anti-CD8 (YTS169.4) antibodies 
were administered at a dose of 250 1g per mouse by i.p. injection once weekly 
for the duration of the experiment. 

Preparation, culture and transplantation of primary sarcomas. Primary sarcomas 
were explanted and single cell suspensions were generated by mincing and digesting 
the tissues for ~1h at 37°C in 125Uml ! collagenase type I (Gibco), 60 U ml | 
hyaluronidase (Sigma), and 2 mg ml | collagenase/dispase (Roche), followed by 
passage through a 70m filter. Subcutaneous transplantation used 2 X 10° cells 
from freshly isolated tumour cells or cell lines from primary autochthonous 
tumours that were trypsinized and washed three times in plain DME medium. 
Transplant recipients were immune-competent or Rag2 ‘~ mice on the 129S,/ 
SvJae background from the same mouse colony used to generate the autochthonous 
tumours. Subcutaneously transplanted tumour volumes were calculated by 
multiplying the length x width X height of each tumour. In Fig. 3, the mean 
volume + s.e.m. of each tumour line is depicted after transplantation into wild- 
type mice (WT, open columns) at the time point when the same tumour line 
reached a volume of 1,000 mm’ in the Rag2™" transplanted mice (Raga, filled 
columns). 

Flow cytometry. Cell suspensions from lymphoid organs were prepared by mech- 
anical disruption between frosted slides. Cells were then stained with antibodies 
for 20-30 min after treatment with FcBlock (BD Pharmingen). Anti-CD8« (53- 
6.7), anti-IFNy (XMG1.2), anti-TNF (MP6-XT22), and DimerX I (Dimeric 


Mouse H-2K”:Ig) were from BD Pharmingen. All antibodies were used at 1:200 
dilution. Peptide-loaded DimerX reagents were prepared as directed and used at 
1:75 dilution. To improve the sensitivity of the DimerX reagent, we used both PE 
and APC labelled dimers to co-stain CD8" T cells. Propidium iodide was used to 
exclude dead cells. Cells were read on a FACSCalibur and analysed using Flowjo 
software (Tree Star). In Fig. 2c, d, data were determined by comparing the fraction 
of CD8* cells in duplicate samples stained with K? dimers or for cytokine pro- 
duction and exceeds 100% due to the incomplete sensitivity of the K” dimers to 
detect antigen specific cells. In Fig. 4c, freshly harvested sarcomas were cultured for 
24h in the presence of 10U IFN-y (solid line) or untreated (dashed line) and 
analysed for H-2K” surface expression (shaded, control antibody). 

Cytokine production. Cells were resuspended in the presence or absence of 
SIYRYYGL and SIINFEKL peptides in OPTI-MEM I (Gibco) supplemented with 
GolgiPlug (BD Pharmingen) for ~4h at 37 °C, 5% COb. Cells were then fixed and 
stained for intracellular cytokines using the Cytofix/Cytoperm kit (BD 
Biosciences). 

Luciferase detection. Freshly explanted tumours or cell lines were lysed in Cell 
Culture Lysis Reagent, mixed with Luciferase Assay Reagent according to the 
manufacturer’s instructions (Promega), and relative light units (RLU) were 
detected using the Optocomp I luminometer (MGM Instruments). RLUs were 
standardized by the total amount of protein (Bio-Rad Protein Asssay) in each 
sample. In vivo bioluminescence images were acquired with the NightOWLII 
LB983 (Berthold Technologies) or the IVIS Spectrum (Xenogen Corp.) after 
intraperitoneal injection of 1.5 mg beetle luciferin (Promega). 
5-aza-2'-deoxycytidine treatment. Tumour cell lines were plated at low con- 
fluency (2 X 10° cells per well of 6-well plate), and treated with 1 1M 5-aza-2'- 
deoxycytidine replaced daily for three consecutive days and then analysed for 
luciferase activity. 

Influenza. WSN-SIY (20 p.f.u. per mouse) provided by J. Chen. FACs analysis 
performed four months after intratracheal infection. 

Statistical analyses. P-values were generated using unpaired two-tailed Fisher 
exact probability tests or Student’s t-tests. 


27. DuPage, M., Dooley, A. L. & Jacks, T. Conditional mouse lung cancer models using 
adenoviral or lentiviral delivery of Cre recombinase. Nature Protocols 4, 
1064-1072 (2009). 
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Galectin 8 targets damaged vesicles for autophagy to 
defend cells against bacterial invasion 


Teresa L. M. Thurston', Michal P. Wandel', Natalia von Muhlinen', Agnes Foeglein! & Felix Randow! 


Autophagy defends the mammalian cytosol against bacterial infec- 
tion’ °. Efficient pathogen engulfment is mediated by cargo-selecting 
autophagy adaptors that rely on unidentified pattern-recognition or 
danger receptors to label invading pathogens as autophagy cargo, 
typically by polyubiquitin coating*°. Here we show in human cells 
that galectin 8 (also known as LGALSS8), a cytosolic lectin, is a danger 
receptor that restricts Salmonella proliferation. Galectin 8 monitors 
endosomal and lysosomal integrity and detects bacterial invasion by 
binding host glycans exposed on damaged Salmonella-containing 
vacuoles. By recruiting NDP52 (also known as CALCOCO2), 
galectin 8 activates antibacterial autophagy. Galectin-8-dependent 
recruitment of NDP52 to Salmonella-containing vesicles is transient 
and followed by ubiquitin-dependent NDP52 recruitment. Because 
galectin 8 also detects sterile damage to endosomes or lysosomes, as 
well as invasion by Listeria or Shigella, we suggest that galectin 8 
serves as a versatile receptor for vesicle-damaging pathogens. Our 
results illustrate how cells deploy the danger receptor galectin 8 to 
combat infection by monitoring endosomal and lysosomal integrity 
on the basis of the specific lack of complex carbohydrates in the 
cytosol. 

Galectins are B-galactoside-binding lectins that accumulate in the 
cytosol before being secreted via a leader-peptide-independent path- 
way". The best-characterized functions of galectins are performed 
extracellularly, where they bind glycans to modulate cellular behaviour. 
However, the occurrence of galectins in the cytosol, which under 
physiological conditions is devoid of complex carbohydrates, makes 
them prime candidates for a role as danger and/or pattern-recognition 
receptors. Galectin 3 (also known as LGALS3) accumulates on damaged 
bacteria-containing vesicles, although the functional consequences of 
its recruitment remain unknown'*”’. We screened a panel of human 
galectins for their ability to detect invasion by Salmonella enterica 
serovar Typhimurium. At 1h post-infection (p.i.), galectin 3, 8 and 9 
accumulated on about 10% of intracellular S. Typhimurium (Fig. la, b 
and Supplementary Fig. la), of which 90% were associated with LAMP1 
(Supplementary Fig. 1b). Recruitment of galectins peaked between 1h 
and 2h p.i. (Supplementary Fig. 1c). As galectin 3, 8 and 9 were 
recruited to Salmonella-containing vesicles (SCVs), we used short inter- 
fering RNAs (siRNAs) to test whether their depletion causes hyperpro- 
liferation of S. Typhimurium. Cells lacking galectin 8 or NDP52, but not 
galectin 3 and/or 9, failed to suppress proliferation of S. Typhimurium 
(Fig. 1c and Supplementary Figs 2a—c and 3a). Microscopic analysis 
confirmed that the greater bacterial burden of cells lacking galectin 8 
was caused by enhanced proliferation rather than differential uptake of 
bacteria (Supplementary Fig. 3b). Hyperproliferating bacteria in cells 
lacking galectin 8 appeared mainly in a LAMP1-negative compartment 
(Supplementary Fig. 3c), consistent with colonization of the cytosol. We 
conclude that galectin 8 is an antibacterial restriction factor. 

As autophagy provides antibacterial protection to cells, the decora- 
tion of SCVs with galectins might be an autophagy-inducing signal, 
analogous to ubiquitin coating'*’’. We therefore tested binding of 
galectins to autophagy receptors that restrict the proliferation of S. 
Typhimurium, that is, NDP52, p62 and optineurin’’. We found in a 


luminescence-based mammalian interactome mapping (LUMIER) 
assay that galectin 8 and NDP352 interacted specifically (Fig. 2a and 
Supplementary Fig. 4a). Binding was confirmed by precipitating 
endogenous NDP52 with Flag-tagged galectin 8 (Fig. 2b). 

SCVs double labelled by endogenous galectin 8 and NDP52 were 
prominent in Salmonella-infected HeLa cells (Fig. 2c). In cells expres- 
sing yellow fluorescent protein (YFP)-tagged galectins the majority of 
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Figure 1 | Galectin 8 responds to infection by S. Typhimurium and restricts 
bacterial proliferation. a, b, Analysis of HeLa cells stably expressing YFP fused 
to the indicated galectins and infected with S. Typhimurium for 1h. 

a, Percentage of bacteria coated by the indicated galectins. YFP-positive 
bacteria were counted by microscopy. Mean and standard deviation (s.d.) of 
triplicate HeLa cultures, n > 100 bacteria per coverslip. b, Confocal 
micrographs. Arrowheads, bacteria shown in insets. DAPI, 4’,6-diamidino-2- 
phenylindole. c, Kinetics of fold replication for S. Typhimurium in HeLa cells 
transfected with the indicated siRNAs. Bacteria were counted on the basis of 
their ability to form colonies on agar plates. Mean and s.d. of triplicate HeLa 
cultures and duplicate colony counts. siRNAs are further characterized in 
Supplementary Fig. 2a-c. **P < 0.01, Student’s t-test. Scale bar, 10 pm. 
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Figure 2 | Galectin 8 binds NDP52. a, LUMIER binding assay: normalized 
ratio between luciferase activity bound to beads and present in lysates. Lysates 
of 293ET cells expressing NDP52, p62 or optineurin each fused to luciferase, 
and the indicated Flag-tagged galectins were incubated with anti-Flag beads. 
Flag-tagged proteins are further characterized in Supplementary Fig. 4a. 

b, Lysates of 293ET cells, expressing Flag-tagged proteins as indicated, were 
immunoprecipitated with anti-Flag beads. Lysates and immunoprecipitates 
(IP) were blotted for the presence of Flag-tagged proteins and endogenous 


galectin-positive SCVs had accumulated NDP52 (Fig. 2d and 
Supplementary Fig. 5a). Furthermore, at 1h p.i. NDP52 and galectin 
8 co-localized tightly in a pattern distinct from p62 or ubiquitin 
‘microdomains’'® (Supplementary Fig. 5b, c). 

To characterize further the interaction between galectin 8 and 
NDP352 we determined their respective binding sites. Galectin 8 con- 
tains two carbohydrate-recognition domains (CRD) (Supplementary 
Fig. 6a). NDP52 bound galectin-8,1_22s (that is, amino acids 229-360), 
equivalent to the second CRD, but not galectin-8_29s (that is, amino 
acids 1-228) (Supplementary Fig. 6b). NDP52 harbours a SKICH 
domain, a coiled coils-forming region, and a ubiquitin-binding zinc 
finger (Supplementary Fig. 6a). Galectin 8 bound NDP52_393 but not 
NDP52)_379 (Supplementary Fig. 6c). The NDP52 fragment spanning 
residues 370-393 is therefore essential for binding galectin 8. This 
fragment, as well as NDP5237>_3g0, purified as GST-fusion proteins, 
bound galectin 8 (Supplementary Fig. 6d). A point mutation within 
NDP52372-3g0 (L374A) abrogated binding to galectin 8, without com- 
promising binding to ubiquitin when introduced into full-length 
NDP352 (Supplementary Fig. 6d, e). Binding of galectin 8 to NDP52 
is direct, as the purified proteins interacted (Supplementary Fig. 6f). 

To determine whether one monomer of the NDP52-galectin-8 
heteromeric complex recruits the other partner, the accumulation of 
galectins on SCVs in cells depleted of NDP52 or TBK1 was analysed 
(Fig. 3a and Supplementary Fig. 2). Galectin 3, 8 and 9 re-distributed 
normally to SCVs in siRNA-treated cells. In contrast, in cells depleted 
of galectin 8 NDP52 did not localize to SCVs at 1h p.i., a phenotype 
that was complemented upon expression of siRNA-resistant galectin 8 
(Fig. 3b and Supplementary Fig. 7). Cells lacking galectin 3 or galectin 9 
had no defect in recruiting NDP52 to SCVs. The recruitment of 
NDP32 to SCVs is therefore specifically mediated by galectin 8, while 
NDP352 and TBK1 are dispensable for the accumulation of galectins on 
SCVs. 

Rupture of SCVs exposes the cytosol to host glycans and microbial 
carbohydrates, either or both of which may cause galectin 8 accumula- 
tion at SCVs. The requirement for carbohydrate binding by galectin 
8 was tested using point mutations in either CRD”. In contrast to 


NDP52. c, Confocal images of HeLa cells infected with S. Typhimurium for 1h 
and stained with antisera against NDP52 and galectin 8. Arrowheads, bacteria 
shown in insets. d, Co-localization of NDP52 with galectin-positive bacteria in 
HeLa cells stably expressing YFP fused to the indicated galectins, infected with 
S. Typhimurium and stained with NDP52 antiserum 1 h after infection. Mean 
and s.d. of duplicate HeLa cultures, n > 100 bacteria per coverslip, 
representative of two independent experiments. Scale bar, 10 jum. 


galectin-8(R232H), galectin-8(R69H) did not accumulate at SCVs, 
showing that the amino-terminal CRD is required for carbohydrate- 
dependent recruitment of galectin 8 to SCVs (Fig. 3c). To test whether 
the carbohydrates detected by galectin 8 are of microbial origin, bind- 
ing of recombinant galectin 8 to bacteria in vitro was analysed. Galectin 
8 did not bind to S. Typhimurium but stained blood-group-B-positive 
bacteria (Escherichia coli strain 086)" (Fig. 3d), suggesting that galectin 
8, when accumulating on SCVs, recognizes host glycans. The occur- 
rence of galectin-8 ligands in host cells was confirmed by staining HeLa 
cells with recombinant galectin 8 (Fig. 3d). Direct evidence that host 
glycans recruit galectins to SCVs was obtained from experiments with 
CHO-Lec3.2.8.1 cells!®, which lack mature glycans and in which recruit- 
ment of galectins to SCVs was severely impaired (Fig. 3e). The detection 
of host glycans on damaged vesicles by galectin 8 suggests that it is not a 
receptor specific for S. Typhimurium. We therefore tested whether 
sterile damage to vesicles is detected by galectins. Osmotic damage of 
endosomes induced dense puncta formed by galectin 3, 8 and 9 but not 
by galectin 1 (Fig. 3f and Supplementary Fig. 8). Damage to lysosomes 
by glycyl-L-phenylalanine 2-naphthylamide (GPN) treatment resulted 
in the initial loss of lysotracker staining, followed by the appearance of 
galectin 3, 8 and 9 speckles (Supplementary Fig. 9a). In contrast to 
damaged SCVs and endosomes, burst lysosomes were also detected 
by galectin 1, suggesting compartment-specific differences in the dis- 
tribution of galectin ligands. GPN failed to induce speckles of galectin- 
8(R69H) (Supplementary Fig. 9b), thereby indicating that binding of 
glycans to the N-terminal CRD of galectin 8 is required to detect lyso- 
somal damage. The capacity of galectin 3, 8 and 9 to detect vesicle 
damage by binding exposed host glycans suggests their ability to sense 
the invasion of cells by a wide range of vesicle-damaging pathogens. 
Indeed, galectin 3, 8 and 9 also accumulated around Gram-positive 
Listeria monocytogenes and Gram-negative Shigella flexneri (Sup- 
plementary Fig. 10), proving that these galectins detect the invasion 
of cells by phylogenetically distant bacteria. We conclude that galectin 
3, 8 and 9 are danger receptors that sense the exposure of host glycans 
on ruptured membranes and thereby monitor the integrity of the 
endosomal/lysosomal compartment. 
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Figure 3 | Galectin 8 is a danger receptor that senses cytosolic host glycans 
and recruits NDP52 to restrict Salmonella proliferation. a, Percentage of S. 
Typhimurium coated by the indicated galectins. HeLa cells stably expressing 
YFP-tagged galectins were treated with the indicated siRNAs. YFP-positive 
bacteria were counted by microscopy at 1h p.i. siRNAs are further 
characterized in Supplementary Fig. 2. b, Analysis of HeLa cells treated with the 
indicated siRNAs and stained with NDP52 antiserum. NDP52-positive bacteria 
were counted by microscopy 1h after infection with S. Typhimurium. siRNAs 
are further characterized in Supplementary Fig. 2. c, Percentage of bacteria 
coated by the indicated galectin 8 alleles. HeLa cells stably expressing the 
indicated galectin 8 alleles fused to YFP were infected with S. Typhimurium. 
YFP-positive S. Typhimurium were counted by microscopy at 75 min p.i. WT, 
wild type. d, Binding of galectin 8 to bacteria and HeLa cells. The indicated 
bacteria and HeLa cells were incubated with His-GST-ubiquitin (Ub), His- 
GST-galectin 8 or buffer as indicated, followed by murine anti-His antibody 


To test whether the recruitment of NDP52 to SCVs is essential for 
the antibacterial function of galectin 8, we depleted cells of galectin 8 
and targeted NDP52 artificially to SCVs by fusing it to galectin 3 
(Fig. 3g and Supplementary Fig. 4b). S. Typhimurium hyperprolifer- 
ated in galectin-8-depleted cells, while expression of galectin 3 fused to 
NDP352 restored the cells’ restrictive capacity. Artificial targeting of 
NDP52 to SCVs therefore compensates for the lack of galectin 8, which 
strongly suggests that the recruitment of NDP52 via galectin 8 is 
essential to efficiently antagonize bacterial invasion. 

NDP32 restricts the proliferation of S. Typhimurium by targeting 
bacteria for autophagy*. As galectin 8 recruits NDP52 to SCVs, we 
investigated whether galectin 8 is required upstream of NDP52 for the 
induction of antibacterial autophagy. First, we confirmed that at 1h 
p.i., S. Typhimurium that had been sensed by galectin-8, and therefore 
had acquired an NDP52 coat, were taken up into LC3-positive 
autophagosomes (Fig. 4a). Such an outcome was predicted from the 
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and PE-labelled anti-mouse serum. e, Percentage of S. Typhimurium coated by 
the indicated galectins. Wild-type CHO cells and mutant Lec3.2.8.1 cells stably 
expressing YFP-tagged galectins were infected with S. Typhimurium. YFP- 
positive bacteria were counted by microscopy at 1h p.i. f, Confocal images of 
HeLa cells expressing the indicated YFP-tagged galectins. Cells were left 
untreated or were exposed to hypertonic conditions, with or without (w/o) PEG 
as indicated, followed by hypotonic shock. g, Fold replication of S. 
Typhimurium in HeLa cells expressing the indicated galectin 3 variants and 
transfected with the indicated siRNAs. At 2h and 6h after infection, cells were 
lysed and bacteria counted on the basis of their ability to form colonies on agar 
plates. Galectin 3 proteins are further characterized in Supplementary Fig. 4b. 
Mean and s.d. of duplicate coverslips (a-c, e) or triplicate HeLa cultures and 
duplicate colony counts (g). >100 bacteria counted per coverslip. Data are 
representative of at least two repeats. *P < 0.05, Student’s t-test. Scale bar, 

10 um. 


pairwise co-localization of NDP52 with galectin-8- and LC3-positive 
bacteria (Fig. 2d and Supplementary Fig. 11). We then tested whether 
depletion of galectin 8 impairs autophagy of S. Typhimurium. In the 
absence of galectin 8 fewer bacteria were targeted by LC3 (Fig. 4b) and 
of the remaining LC3-positive bacteria fewer had accumulated NDP52 
(Supplementary Fig. 11). In contrast, galectin-8 recruitment to SCVs 
did not require autophagy as it occurred undisturbed in ATGS /~ 
fibroblasts (Supplementary Fig. 12). We conclude that the danger 
receptor galectin 8, by recruiting NDP52, directs autophagy towards 
invading bacteria. 

The recruitment of NDP52 to invading bacteria is mediated by two 
signals, the newly discovered carbohydrate-dependent galectin-8 
pathway and the previously known ubiquitin-dependent pathway’. 
Their differential contribution to the recruitment of NDP52 to 
S. Typhimurium was investigated by analysing NDP52 mutants selec- 
tively disabled to bind galectin 8 and/or ubiquitin. For accurate scoring, 
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Figure 4 | The antibacterial effect of galectin 8 is mediated by autophagy. 
a, Confocal micrograph of HeLa cells expressing GFP-LC3 and mCherry- 
galectin-8, stained for NDP52 1h after infection with S. Typhimurium. The 
lower right panel contains a fluorescence line scan along the yellow line in the 
merge inset. Arrowheads, bacteria shown in insets. b, Percentage of GFP-LC3- 
positive S. Typhimurium at 1h p.i. in HeLa cells treated with the indicated 
siRNAs. c, Percentage of bacteria positive for NDP52. HeLa cells expressing the 
indicated NDP52 variants fused to YFP were infected with S. Typhimurium. 
d, Percentage of bacteria positive for the indicated markers. HeLa cells, either 
wild type or expressing YFP-galectin-8 as indicated, were infected with S. 
Typhimurium. Ubiquitin was detected by antibody staining. e, HeLa cells 
treated with the indicated siRNAs were infected with S. Typhimurium and 
stained for NDP52 at the indicated time points. Fluorescent bacteria were 
counted by microscopy at the indicated time points. Graphs, representing at 
least two independent repeats, show mean and s.d. of duplicate coverslips for 
which >200 bacteria were counted. *P < 0.05, Student’s t-test. Scale bar, 10 um. 


NDP52ASKICH was used, as this truncated allele is distributed 
diffusely throughout the cytosol. NDP5297_446 associated with SCVs 
at all time points investigated (Fig. 4c). Deleting the carboxy-terminal 
ubiquitin-binding zinc finger (NDP5227 429) impaired the recruit- 
ment of NDP52 to S. Typhimurium at late but not early time points. 
In contrast, NDP52j97_446(L374A), which lacks affinity for galectin 8 
but not ubiquitin (Supplementary Fig. 6e), did not co-localize with 
bacteria at 1h p.i. but accumulated progressively over time (Fig. 4c). 
NDP32 is therefore recruited to SCVs in two phases—an early transient 
surge driven by galectin 8 and a later wave dependent on ubiquitin. The 
kinetics of galectin 8 and ubiquitin recruitment to SCVs support this 
model as SCVs are marked by galectin 8 only at early time points, 
whereas ubiquitin marks persist (Fig. 4d). Direct evidence for early 
galectin-8-dependent and late galectin-8-independent recruitment of 
NDP32 to S. Typhimurium was obtained from cells depleted of galectin 
8, in which NDP52 and bacteria co-localized at 4h but not at 1h p.i. 
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(Fig. 4e). NDP5227_420(L374A), deficient in binding to ubiquitin and 
galectin 8, did not translocate to SCVs at any time point tested (Fig. 4c). 
Taken together, NDP52 relocates to SCVs in response to two signals, 
which are active against bacteria at different stages of invasion. The 
early response to invading bacteria requires the galectin-8-dependent 
pathway, whereas the zinc-finger-dependent pathway dominates at 
later time points. 

The galectin-8/ NDP52 pathway sheds light on why most intracellular 
bacteria avoid the cytosol and prefer vesicular compartments. The 
cytosol seems to be protected by synergistic layers of antibacterial 
defence that activate autophagy at distinct steps of the invasion process. 
An early line of defence comprises the accumulation of diacylglycerol 
on bacteria-containing vesicles, which subsequently become the target 
ofautophagy”’. Bacteria escaping the diacylglycerol pathway and expos- 
ing host glycans on their damaged vacuoles are targeted by galectin 8 
and NDP52, as described in this work. A third layer of defence coats 
invading bacteria with polyubiquitin®°”°. Neither the enzymatic 
machinery for nor the substrate of ubiquitylation have been identified, 
although LRSAM1, a RING-finger E3 ubiquitin ligase, contributes to 
autophagy of S. Typhimurium’. Peptidoglycan and septin cages sur- 
rounding cytosolic bacteria also contribute to autophagy”. Defects in 
this intricate network of autophagy-inducing defence pathways are 
likely to cause susceptibility to infection and promote inflammation, 
for example in Crohn’s disease”. Galectin 8 is positioned strategically 
at the cellular entry point for a variety of pathogens and is therefore 
expected to have shaped pathogen evolution. 


METHODS SUMMARY 


Galectins were cloned as YFP fusions and transduced into HeLa cells. HeLa cells 
were infected with S. Typhimurium strain 12023. For confocal microscopy cells 
were fixed in paraformaldehyde. Bacterial growth was assessed by a gentamycin 
protection assay. Knockdowns were accomplished with Stealth siRNAs. LUMIER 
assays were performed as described”. For flow-cytometric analysis samples were 
incubated with lysates of E. coli expressing His-GST fusion proteins, followed by 
anti-His antibody and goat anti-mouse serum. Statistical testing was performed 
using two-tailed Student’s f-test. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Antibodies. Antibodies were from QIAGEN (Penta-His), the Developmental 
Studies Hybridoma Bank (LAMP1), BD Transduction Laboratories (p62), Santa 
Cruz (GAL8-H80, TBK1-C100), R&D Systems (galectin 8), Transduction 
Laboratories (NDP52, for western blots), Enzo Life Science (ubiquitin FK2), Sigma 
(ATGS, Flag M2), Dabco (HRP-conjugated reagents), Jackson ImmunoResearch 
Laboratories (goat anti-mouse-phycoerythrin (PE)) and Invitrogen (Alexa- 
conjugated anti-mouse and anti-rabbit antisera). The antiserum against NDP52 
used for immunofluorescence was a gift from J. Kendrick-Jones. 

Plasmids. M5P or closely related plasmids were used to produce recombinant 
MLV for the expression of proteins in mammalian cells*'. pETM plasmids were 
gifts from A. Geerlof. Open reading frames encoding human galectins, NDP52, 
p62, optineurin, ubiquitin, ATG5 and LC3C were amplified by PCR or have been 
described***. Mutations were generated by PCR and verified by sequencing. 
Bacteria. S. Typhimurium (strain 12023), provided by D. Holden, was grown 
overnight in Luria broth (LB) and sub-cultured (1:33) in fresh LB for 3.5 h before 
infection. HeLa cells in 24-well plates were infected with 20 11 of such cultures for 
15 min at 37°C. Following two washes with warm PBS and an incubation with 
100 pg ml~' gentamycin for 2h cells were cultured in 20 pg ml ' gentamycin. To 
enumerate intracellular bacteria, cells from triplicate wells were lysed in 1 ml cold 
PBS containing 0.1% Triton-X-100. Serial dilutions were plated in duplicate on 
TYE agar. 

S. flexneri M90T, provided by C. Tang, was grown overnight in Tryptic Soy 
Broth (TSB) and sub-cultured (1:100) in fresh TSB for 2h before infection. 
Bacteria were resuspended in warm IMDM and 100 ul were added to HeLa cells 
in 24-well plates. Samples were centrifuged for 10 min at 670g. Following incuba- 
tion at 37 °C for 30 min, cells were washed with warm PBS and cultured in 100 1g 
ml | gentamycin for 2h and 20 1g ml ' thereafter. 

L. monocytogenes strain EGD (BUG 600), provided by P. Cossart, was grown 
overnight in Brain Heart Infusion (BHI) at 30°C with shaking. Five-hundred 
microlitres of diluted cultures (1:333) were added to HeLa cells in 24-well plates, 
which were centrifuged at 670g for 10 min. Cells were incubated for 1h at 37 °C, 
washed with warm PBS and cultured in media supplemented with 100 1g ml! 
gentamycin for the next hour and 20 1g ml’ gentamycin thereafter. 

Cell culture. Cells were grown in IMDM supplemented with 10% FCS at 37 °C in 
5% CO>. HeLa cells were obtained from the European Collection of Cell Cultures, 
CHO and Lec3.2.8.1 cells'® were obtained from from P. Stanley, ATG5 /~ MEFs*3 
from N. Mizushima. 

RNA interference. 5 X 10° cells per well were seeded in 24-well plates. The fol- 
lowing day, cells were transfected with 40 pmol of siRNA (Invitrogen) using 
Lipofectamine 2000 (Invitrogen) in Optimem medium (Invitrogen). Optimem 
was replaced with complete IMDM medium after 4h and experiments were per- 
formed after 3 days. siRNAs targeted the following sequences: siNDP52 
5'-UUCAGUUGAAGCAGCUCUGUCUCCC}; siGAL8 #36 5’-CCCACGCCUG 
AAUAUUAAAGCAUUU; siGAL8 #38 5’-GGACAAAUUCCAGGUGGCUGU 
AAAU; siGAL3 #669 5’-AAGCCCAAUGCAAACAGAAUUGCUU; siGAL3 
#670 5’-GAGAACAACAGGAGAGUCAUUGUUU; siGAL9 #807 5'-GGCUU 
CAGUGGAAAUGACAUUGCCU; siGAL9 #809 5’-UGUGCAACACGAGGC 
AGAACGGAGG; siTBK1 5'-GACAGAAGUUGUGAUCACA(TT)*. 

To render galectin 8 resistant to siGAL8 #38, silent mutations (underlined) 
GGATAAGTTICAAGTCGCAGTTAAT were introduced by PCR and con- 
firmed by sequencing. 

Immunoprecipitation and western blot. Post-nuclear supernatants from 2 X 
10° HeLa cells expressing Flag-tagged proteins were obtained following lysis 
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(150mM NaCl, 0.1% Triton-X-100, 20mM Tris-HCl (pH 7.4), 5mM EDTA 
and proteinase inhibitors). Protein complexes were immunoprecipitated for 2h 
with Flag agarose before washing. Samples were eluted with Flag peptide and 
separated on 4-12% denaturing Bis-Tris gels (Invitrogen). Visualization following 
immunoblotting was performed using ECL detection reagents (Amersham 
Bioscience). 

LUMIER assays. LUMIER binding assays**** with pairs of putative interactors, 
one fused to luciferase and the other fused to GST or Flag, were performed in 
LUMIER lysis buffer (150 mM NaCl, 0.1% Triton-X-100, 20 mM Tris-HCl (pH 
7.4), 5% glycerol, 5mM EDTA and proteinase inhibitors). GST-fusion proteins 
were immobilized on beads before incubation with the luciferase tagged binding 
partner for 2 h. For Flag-based assays, both proteins were expressed in 293ET cells 
and immobilized using Flag-agarose. After washing in lysis buffer, proteins were 
eluted with glutathione or Flag peptide in Renilla lysis buffer (Promega). Relative 
luciferase activity represents the ratio of activity eluted from beads and present in 
lysates. 

FACS. To examine the binding of galectin 8, bacteria in stationary phase or HeLa 
cells were washed in PBSF (PBS, 2% FCS) and incubated for 30 min at 4°C with 
cleared lysates of E. coli expressing His-GST fusion proteins, followed by incuba- 
tions with anti-His antibody and PE-conjugated goat anti-mouse serum. Bacteria 
were fixed in 4% paraformaldehyde before analysis. 

Sterile damage to vesicles. Endosomes were lysed by exposing cells for 10 min to 
hypertonic medium (0.5 M sucrose in PBS, with or without 10% PEG1000), fol- 
lowed by two PBS washes and an incubation in 60% PBS for 3 min”*. Cells were 
returned to complete medium for 20 min, before being fixed in paraformaldehyde. 
For live imaging of lysosomal damage, cells were labelled for 1h with 100nM 
LysoTracker Red (Invitrogen), washed with PBS, incubated in Leibovitz L15 
medium and, after acquisition of the first image, exposed to 333 UM GPN*’. 
Microscopy. HeLa cells were grown on glass cover slips before infection. After 
infection, cells were washed twice with warm PBS and fixed in 4% paraformaldehyde 
in PBS for 30 min. Cells were washed twice in PBS and then quenched with PBS pH 
7.4 containing 1 M glycine and 0.1% Triton-X-100 for 30 min before blocking for 
30 min in PBTB (PBS, 0.1% Triton-X-100, 2% BSA). Cover slips were incubated 
with primary followed by secondary antibodies for 1 h in PBTB before being mount- 
ing in medium containing DAPI (Vector Laboratories). At least 100 events per slide 
were scored in quantitative assays. Confocal images were taken with a 63, 1.4 
numerical aperture objective on either a Zeiss 710 or a Zeiss 780 microscope. Live 
imaging was performed on a Nikon Eclipse Ti equipped with an Andor Revolution 
XD system and a Yokogawa CSU-X1 spinning disk unit. 
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Brassinosteroid regulates stomatal development by 
GSK3-mediated inhibition of a MAPK pathway 


Tae-Wuk Kim!, Marta Michniewicz*, Dominique C. Bergmann® & Zhi- Yong Wang! 


Plants must coordinate the regulation of biochemistry and anatomy 
to optimize photosynthesis and water-use efficiency. The formation 
of stomata, epidermal pores that facilitate gas exchange, is highly 
coordinated with other aspects of photosynthetic development. The 
signalling pathways controlling stomata development are not fully 
understood’”, although mitogen-activated protein kinase (MAPK) 
signalling is known to have key roles. Here we demonstrate in 
Arabidopsis that brassinosteroid regulates stomatal development 
by activating the MAPK kinase kinase (MAPKKK) YDA (also 
known as YODA). Genetic analyses indicate that receptor kinase- 
mediated brassinosteroid signalling inhibits stomatal development 
through the glycogen synthase kinase 3 (GSK3)-like kinase BIN2, 
and BIN2 acts upstream of YDA but downstream of the ERECTA 
family of receptor kinases. Complementary in vitro and in vivo 
assays show that BIN2 phosphorylates YDA to inhibit YDA 
phosphorylation of its substrate MKK4, and that activities of 
downstream MAPKs are reduced in brassinosteroid-deficient 
mutants but increased by treatment with either brassinosteroid or 
GSK3-kinase inhibitor. Our results indicate that brassinosteroid 
inhibits stomatal development by alleviating GSK3-mediated 
inhibition of this MAPK module, providing two key links; that of 
a plant MAPKKK to its upstream regulators and of brassinosteroid 
to a specific developmental output. 

In animals and plants, steroid hormones have important roles in 
coordinating development and metabolism’. In contrast to animal 
steroid hormones, which act through nuclear receptor transcrip- 
tion factors’, the plant steroid hormone brassinosteroid binds to the 
extracellular domain of the membrane-bound receptor kinase 
brassinosteroid insensitive 1 (BRI1). This activates intracellular signal 
transduction mediated by the serine/threonine protein kinase BSK1, 
the protein phosphatase BSU1, the GSK3-like BIN2 kinase, PP2A 
phosphatase and BRASSINAZOLE RESISTANT 1 (BZR1) family 
transcription factors*’°. When brassinosteroid levels are low, BZR1 
is inactivated owing to phosphorylation by BIN2 (refs 11, 12). 
Brassinosteroid signalling leads to inactivation of BIN2, and PP2A- 
mediated dephosphorylation and activation of BZRI1 (refs 4, 9, 10) 
(Supplementary Fig. 1a). Although the brassinosteroid signalling path- 
way has been characterized, its connections to other signalling and 
developmental pathways are not fully understood. 

Stomata are epidermal pores that control gas exchange between the 
plant and the atmosphere and are critical for maintaining photo- 
synthetic and water-use efficiency in the plant. The density and dis- 
tribution of stomata in the epidermis of aerial organs is modulated by 
intrinsic developmental programs, by hormones and by environ- 
mental factors such as light, humidity and carbon dioxide'*"*"*. The 
genetically defined signalling pathway that regulates stomatal develop- 
ment includes peptide ligands, a receptor protein (TMM), the 
ERECTA family of receptor-like kinases (ER, ERL1 and ERL2) and a 
MAPK module comprised of the MAPK kinase kinase (MAPKKK) 
YDA, the MAPK kinases (MAPKKs) MKK4, MKK5, MKK7 and 
MKK9, and MAPKs MPK3 and MPK6 (ref 15). Potential downstream 


targets include basic helix-loop-helix (bHLH) transcription factors 
SPEECHLESS (SPCH), MUTE, FAMA, ICE1 (also known as 
SCRM) and SCRM2, with SPCH being negatively regulated by direct 
MPK3- and MPK6-mediated phosphorylation’®*'? (Supplementary 
Fig. 1b). It is possible that the MAPK pathway integrates environ- 
mental and hormonal inputs to optimize stomatal production, but 
nothing is known about the nature of these signals and their biochemical 
mechanisms of MAPK pathway regulation. 

Excess stomata have been observed in some brassinosteroid- 
deficient mutants’*. To elucidate the function of brassinosteroid in 
regulating stomatal development, we examined the distribution of 
stomata on leaves of brassinosteroid-deficient and brassinosteroid- 
signalling mutants. In wild-type Arabidopsis, stomata are always dis- 
tributed with at least one pavement cell between them (Fig. 1a). 
Brassinosteroid deficiency causes stomatal clusters (Fig. 1b, c), whereas 
treatment with brassinolide (the most active form of brassinosteroid) 
reduces stomatal density (Fig. 1d), indicating that brassinosteroid 
represses stomatal development. The brassinosteroid-insensitive 
mutants bril-116, quadruple amiRNA-BSL2,3 bsul bsll (bsu-q)’, 
dominant bin2-1 and plants that overexpress BIN2 also exhibit 
stomatal clustering (Fig. le-h), and overproduce stomatal precursors 
(meristemoids and guard mother cells) (Fig. lu and Supplementary 
Fig. 2). In contrast to the weak stomatal clustering phenotype of the 
det2-1 and bril-116 mutants, bsu-q showed large stomatal clusters on 
hypocotyls (Supplementary Fig. 4) and cotyledon surfaces consisting 
almost entirely of stomata (Fig. 1f, u, and Supplementary Figs 2 and 3). 
Surprisingly, the hyperactive bzr1-1D mutation’® did not affect 
stomatal development or suppress the stomatal phenotypes of 
bril-116, bsu-q and bin2-1, although it suppressed their dwarf 
phenotypes (Fig. li-n and Supplementary Fig. 5). These results indi- 
cate that brassinosteroid regulation of stomatal development is 
mediated by upstream signalling components that include BRI, 
BSU1 and BIN2, but that it is independent of the BIN2 substrate BZR1. 

Consistent with increased stomatal development in brassinosteroid- 
insensitive mutants, fewer stomata were observed in cotyledons of 
plants overexpressing some of the positive brassinosteroid-signalling 
components of the BSU1 family (Fig. 1q, u and Supplementary 
Fig. 6) and in bin2-3 bill bil2 loss-of-function mutants lacking 3/7 
brassinosteroid-signalling GSK3-like kinases (Fig. lo, p, u and 
Supplementary Fig. 2). We used bikinin (4-[(5-bromopyridin-2-yl) 
amino]-4-oxobutanoic acid, ChemBridge Corporation), a highly spe- 
cific inhibitor for the 7 Arabidopsis GSK3-like kinases that appear to be 
involved in brassinosteroid signalling’”°*', to investigate further the 
function of brassinosteroid-related GSK3-like kinases in stomatal 
development. When added to the growth medium, bikinin decreased 
stomatal production in wild-type plants, fully suppressed the stomatal 
clustering phenotypes of bin2-1 and partially suppressed the severe 
stomatal phenotypes of bsu-q (Fig. 1r—u). These results confirm that 
increased activity of the GSK3-like kinases is responsible for enhanced 
stomatal production in brassinosteroid-deficient and brassinosteroid- 
insensitive mutants. 
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Figure 1 | Brassinosteroid negatively regulates stomatal development. 

a-i, kt, Differential interference contrast (DIC) microscopy images of abaxial 
cotyledon epidermis of 8-day-old seedlings or leaf epidermis of 4-week-old 
plants (k, 1) with indicated genotypes (Col-0 and Ws are wild-type controls), 
grown on medium + BRZ (2 1M), brassinolide (BL, 50 nM), or bikinin (bk, 
30 tM). j, Growth phenotype of 4-week-old bsu-q and bsu-q bzr1-1D mutants. 
u, Quantification of epidermal cell types of the indicated 8-day-old mutants, 
expressed as percentage of total cells. GMC, guard mother cell; M, meristemoid. 
Brackets in b, ¢, e, g, h, m, n indicate clustered stomata. Scale bars, 50 jum. 


We examined genetic interactions between brassinosteroid mutants 
and known stomatal mutants. Expression of constitutively active 
YDA (CA-YDA) can completely eliminate stomatal development” 
(Fig. 2a), probably through activation of a MAP kinase pathway that 
phosphorylates and inactivates SPCH’*'®. Expression of CA-YDA 
completely suppressed stomatal development of the bril-116, bsu-q 
and bin2-1 mutants (Fig. 2b-d). Loss of SPCH was also completely 
epistatic to bsu-q in that a bsu-q spch-3 (null) mutant lacked stomata 
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Figure 2 | Arabidopsis GSK3 acts downstream of ERECTA family and 
TMM but upstream of YDA in the stomatal development signalling 
pathway. a-d, CA-YDA expression eliminates stomata in bril-116, bsu-q and 
bin2-1. e, f, Loss of SPCH eliminates stomata in bsu-q mutants. 

g-k, Representative stomatal phenotypes of leaf epidermis of er erll erl2 

(g), tmm (h), yda (i), HOPAII (j) and scrm-D (k) plants grown in the absence 
(—bk) or presence (+bk) of 30 UM bikinin. Scale bars, 50 jum. 


and precursors (Fig. 2e, f), indicating that the brassinosteroid signal- 
ling components act upstream of the canonical stomatal MAP kinase 
pathway. Bikinin effectively suppressed the weak stomatal clustering 
phenotype of tmm and partially suppressed the severe phenotype of 
er erll erl2 triple mutants (Fig. 2g, h and Supplementary Figs 7 and 8), 
but had no significant effect on the phenotypes of the yda mutant, 
on plants overexpressing the pathogen effector HOPAI1 (which 
inactivates MPK3 and MPK6)” or on the scrm-D gain-of-function 
mutant™* (Fig. 2i-k and Supplementary Fig. 8). The brassinosteroid 
biosynthetic inhibitor brassinazole also significantly enhanced the 
stomatal phenotypes of tmm, but did not further increase stomata in 
er erll erl2, probably because the er erl1 erl2 surfaces are already nearly 
confluent with stomata (Supplementary Fig. 9). These results strongly 
indicate that GSK3-like kinases act downstream of the ER and TMM 
receptors, but upstream of the YDA MAPKKK. 

YDA contains 84 putative GSK3 phosphorylation sites (Ser/Thr- 
X-X-X-Ser/Thr). Many of these sites are conserved in the two rice 
homologues of YDA, Os02g0666300 and Os04g0559800, and these 
homologues also share a highly conserved sequence just amino- 
terminal of the kinase domain. Importantly, YDA can be made con- 
stitutively active when part of this region (amino acids 185-322; 
Fig. 3a) is deleted*’. The region that is deleted in CA-YDA contains 
23 putative GSK3 phosphorylation sites, including successive 
phosphorylation sites that are similar to sites found in the known 
BIN2 target BZR1 (Fig. 3a and Supplementary Fig. 10). 

We tested whether BIN2 directly interacts with and phosphorylates 
YDA. Maltose binding protein (MBP)-YDA was detected in an overlay 
assay by using GST-BIN2 and anti-GST antibody (Fig. 3b), demon- 
strating direct YDA binding to BIN2 in vitro. BIN2 also interacted with 
YDA and CA-YDA in yeast two-hybrid assays (Fig. 3c). In vitro kinase 
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Figure 3 | BIN2 inhibits YDA kinase activity through phosphorylation. 

a, Domain structure of YDA. b, Gel blot of indicated proteins (MBP-CDGI1 isa 
negative control) sequentially probed with GST-BIN2 and anti-GST-HRP 
antibody. c, Yeast two-hybrid assays of indicated proteins. d, e, In vitro kinase 
assays of BIN2 phosphorylation of YDA or YDA fragment containing amino 
acids 185-322 (185-322). Upper panel shows autoradiography and bottom 
panel shows protein staining. Mutant BIN2 (mBIN2) is kinase inactive. f, YDA- 
Myc plants grown for 5 days on medium containing 2 uM BRZ + 30 uM 
bikinin and analysed by anti-Myc immunoblot. g, Proteins transiently 
expressed in N. benthamiana leaves, immunoprecipitated (IP) with anti- YFP 
antibody, and immunoblotted with anti-Myc or anti- YFP antibody. h, YDA 
pre-incubated with BIN2 or mBIN2 (kinase-inactive mutant) and ATP was 
purified then incubated with mutant MKK4 (mMMKé4) and [°*P]yATP, 


assays showed that BIN2 phosphorylated YDA, but YDA did not 
phosphorylate a kinase-inactive BIN2 mutant or other brassinosteroid 
signalling components (Fig. 3d and Supplementary Fig. 11). BIN2 
strongly phosphorylated the region deleted in CA-YDA (Fig. 3e), 
indicating that BIN2 might inhibit YDA by phosphorylating its 
autoregulatory domain. 

BIN2 phosphorylation of BZR1 causes mobility shifts of the 
phosphorylated BZR1 band in SDS-polyacrylamide gel electrophoresis 
(SDS-PAGE) gels’. Like BZR1, YDA that was phosphorylated by 
BIN2 in vitro also exhibited slower mobility (Fig. 3d and Supplemen- 
tary Fig. 11). Consistent with the in vitro data, bikinin treatment of 
Arabidopsis seedlings increased the mobility of YDA-Myc in 
SDS-PAGE (Fig. 3f). When transiently expressed in Nicotiana 
benthamiana leaf cells, both YDA-Myc and CA-YDA-Myc were co- 
immunoprecipitated by anti-yellow fluorescent protein (YFP) antibody 
when co-expressed with BIN2-YFP but not when expressed alone 
(Fig. 3g), demonstrating that there is an interaction between BIN2 
and YDA in vivo. Furthermore, co-expression of BIN2 retarded 
mobility of YDA, but not of CA-YDA bands in immunoblots 
(Fig. 3g). These results confirm that BIN2 mainly phosphorylates the 
YDA N-terminal regulatory domain. 

Finally, we tested whether BIN2 phosphorylation of YDA affects 
YDA kinase activity and whether brassinosteroid and bikinin affect 
MAPK activity in plants. YDA was pre-incubated with BIN2 and ATP, 
or with a kinase-inactive mutant BIN2 as a control, and then purified 
and further incubated with MKK4 (its known substrate), bikinin and 
[?P]yATP. Pre-incubation with BIN2, but not with mutant BIN2, 


+ bikinin. Numbers indicate relative signal levels normalized to loading 
control. i-j, MPK6 and MPK3 activities in seedlings treated with flg22 (10 nM, 
positive control), bikinin (30 uM) or BL (100 nM) for 30 min (i) or 2h 

(j), analysed by in-gel kinase assays. Numbers indicate relative signal levels 
(upper panel) normalized to the loading control (CBB or MPK6 immunoblot). 
k, A model for regulation of stomatal development by two receptor kinase- 
mediated signal transduction pathways. When brassinosteroid levels are low, 
BIN2 phosphorylates and inactivates YDA, increasing stomatal production. 
Brassinosteroid signalling through BRI1 inactivates BIN2, leading to activation 
of YDA and downstream MAPK proteins, and suppression of stomatal 
development. ERECTA is genetically upstream of YDA; a biochemical link is 
not known, but BSU1 and BIN2 or their homologues are strong candidates for 
intermediates (dashed line). 


decreased YDA phosphorylation of MKK4 (Fig. 3h and Supplemen- 
tary Fig. 12), indicating that BIN2 phosphorylation inhibits YDA 
activity. Consistent with BIN2 inactivation of YDA, the kinase activities 
of MPK3 and MPK6 were reduced in the det2 mutant but increased by 
treatment with bikinin or brassinolide (Fig. 3i and 3)). 

Taken together, our genetic and biochemical analyses demonstrate 
that brassinosteroid negatively regulates stomatal development by 
inhibiting the BIN2-mediated phosphorylation and inactivation of 
YDA (Fig. 3k). When brassinosteroid levels are low, active BIN2 
directly phosphorylates and inactivates YDA; reduced MAP kinase 
pathway activity can de-repress SPCH, allowing SPCH to initiate 
stomatal development. Brassinosteroid signalling through BRII, 
BSK1 and BSU1 inactivates GSK3, resulting in activation of the 
MAP kinase pathway and inhibition of stomatal production (Fig. 3k). 

This study supports a role of brassinosteroid as a master regulator 
that coordinates both physiological and developmental aspects of 
plant growth. Previous studies have demonstrated key functions of 
brassinosteroid in inhibiting photomorphogenesis and photosynthetic 
gene expression”>’’. Here we find a role for brassinosteroid in stomatal 
production, which must be coordinated with other developmental 
processes to optimize photosynthetic and water-use efficiency. 
Notably, brassinosteroid represses light-responsive gene expression 
and chloroplast development mainly through the BZR1-mediated 
transcriptional network**”’, but represses stomatal development 
through a BZR1-independent GSK3-MAPK crosstalk mechanism. 
Both GSK3 and MAPK are highly conserved in all eukaryotes, but 
it remains to be seen whether GSK3 directly inactivates MAPKKK 


16 FEBRUARY 2012 | VOL 482 | NATURE | 421 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


proteins in animals. This GSK3-MAPK connection has the potential 
to act in multiple receptor kinase-mediated signalling pathways, 
mediating crosstalk between these pathways in plants. The stronger 
stomata-clustering phenotype of bsu-q and suppression of er erll erl2 
stomata phenotypes by bikinin raise a possibility that members of the 
BSUI1 and GSK3 families mediate signalling by the ERECTA family 
receptor kinases. However, the signals from BRI] and ERECTA family 
must be partitioned differently downstream so that BRI1 controls 
GSK3 regulation of both BZR1 and YDA but ERECTA family mainly 
controls the GSK3 inactivation of YDA (Fig. 3k), because er erll erl2 
had no obvious effect on brassinosteroid-regulated BZR1 phosphor- 
ylation (Supplementary Fig. 13). Similar mechanisms and components 
might also be used by additional signalling pathways, such as the innate 
immunity pathway downstream of the FLS2 receptor kinase, which 
shares the BAK1 co-receptor*® and downstream components MPK3 
and MPK6 with BRI] (ref. 23). In support of such an idea, overexpres- 
sion of a GSK3-like kinase reduced the pathogen-induced activation of 
MPK3 and MPK6 (ref. 29). How signalling specificity is maintained 
when multiple pathways share the same components is a question for 
future study, and studies of the brassinosteroid model system will 
probably shed light on the hundreds of plant receptor kinases and their 
crosstalk during plant responses to complex endogenous and environ- 
mental cues. 


METHODS SUMMARY 

Stomatal quantification. Cotyledons of 8-day-old seedlings were cleared in ethanol 
with acetic acid (ratio of 19:1, v/v) and mounted on slides in Hoyer’s solution (see 
ref. 22). Two to four images at X400 magnification (180 jum?) were captured per 
cotyledon from central regions of abaxial leaves. Guard cells, meristemoids, GMCs 
and pavement cells were counted. Statistical analysis was performed by Sigmaplot 
software (Systat Software). For treatment with bikinin”’, seedlings were grown on 
half-strength Murashige and Skoog (MS) medium containing dimethylsulphoxide 
(DMSO) or 30 uM bikinin (+10 uM oestradiol for HOPAI1-inducible lines) for 8 
days before stomata were analysed. 

Biochemical assays. To test the bikinin effect on YDA-Myc phosphorylation, 
homozygous YDA-4Myc plants were grown on 1/2 MS medium containing 2 1M 
BRZ (BRASSINAZOLE, an inhibitor of BR synthesis) for 5 days and treated with 
30 uM bikinin or 2 1.M BRZ solution for 30 min with gentle agitation. Yeast two- 
hybrid, in vitro interaction and kinase assays”””, and in-gel kinase assays*® were 
carried out as described previously. Details of methods are available in the 
Supplementary Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Materials and growth conditions. All mutants are in the Columbia ecotype 
except yda Y295 (C24 ecotype)”, CA-YDA (Ler ecotype)” and bin2-3 bill bil2 
triple mutant obtained from J. Li (Ws ecotype)*’. The erecta triple mutant er 105 
erll-2 erl2-1 (ref. 32) and scrm-D (ref. 24) were obtained from K. Torii. J.-M. Zhou 
provided seeds of oestradiol-inducible HOPAI/ transgenic plants”. For all analyses, 
Arabidopsis seedlings were grown on MS agar medium for 8 days under continuous 
light in Percival growth chamber at 22 °C. 

Stomatal quantification. Cotyledons of 8-day-old seedlings were cleared in ethanol 
with acetic acid and mounted on slides in Hoyer’s solution (see ref. 22). Two to four 
images at X400 magnification (180 jim”) were captured per cotyledon from central 
regions of abaxial leaves. Guard cells, meristemoids, GMCs and pavement cells were 
counted. Statistical analysis was performed by Sigmaplot software (Systat Software). 
For treatment with bikinin”, seedlings were grown on half-strength MS medium 
containing DMSO or 30 tM bikinin (+10 1M estradiol for HOPAI1-inducible lines) 
for 8 days before stomata were analysed. 

Plasmids. For cloning MBP-185/322, a partial cDNA was amplified from a YDA 
cDNA clone using primers (forward; 5’-caccAGTAACAAAAACTCAGCTG 
AGATGTTT-3’, reverse; 5'-AGAGCTAG GACCAGGGCTTGTCATTCT-3’), 
cloned into pENTR-SD-D-TOPO vector (Invitrogen) and then subcloned into 
the gateway-compatible pMALc2 vector (New England Biolab). For expression in 
plants, cDNA entry clones of YDA and CA-YDA were subcloned into a gateway- 
compatible 35S::4myc-6His vector constructed in the pCAMBIA 1390 vector. 
BSL2 cDNA in the pENTR vector was subcloned into Gateway-compatible 
pEarley-101 vector* to generate 35S::BSL2- YFP. 

Overlay assay. To test the interaction of YDA and BIN2 in vitro, a gel blot 
separating MBP, MBP-CDGI (a protein kinase used as a negative control) and 
MBP-YDA was incubated with 20 ug GST-BIN2 in 5% non-fat dry milk/PBS 
buffer and washed four times. The blot was then probed with HRP-conjugated 
anti-GST antibody (Santa Cruz Biotechnology). 

In vitro kinase assay. Induction and purification of proteins expressed from 
Escherichia coli was performed as described previously”. For Fig. 3d, e, 1 ug of 
GST-BIN2 or 0.5 ug of MBP-BIN2 was incubated with 1 jig of MBP-YDA or 
MBP-185/322 in the kinase buffer (20 mM Tris, pH 7.5, 1 mM MgCl, 100 mM 
NaCland 1 mM DTT) containing 100 1M ATP and 10 11Ci [**P]yATP at 30 °C for 
3 h. To examine whether BIN2 inhibits YDA activity, equal amounts of MBP- 
YDA were pre-incubated with GST-BIN2 or GST-mBIN2 (M1154) for 2 h. Pre- 
incubated MBP-YDA was subsequently purified using glutathione beads and 
amylose beads to remove GST-BIN2 or GST-mBIN2. Purified YDA was then 
incubated with GST-mMKK4 (K108R), 10 Ci [P]ATP and 10 uM bikinin (to 
inhibit any residual BIN2) at 30 °C for 3 h. YDA kinase activity towards mMKK4 
was analysed by SDS-PAGE followed by autoradiography. 

In-gel kinase assay. The in-gel kinase assay was performed as described previ- 
ously*’, with some modifications. Total proteins were extracted with buffer con- 
taining 50 mM Tris, pH 7.5, 150 mM NaCl, 5% Glycerol, 1% Triton X-100, 1 mM 
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phenylmethylsulphonyl fluoride (PMSF), 1 uM E-64, 1 uM bestatin, 1 1M 
pepstatin and 2 1M leupeptin. Supernatant obtained from 12,000 r.p.m. centrifu- 
gation was quantified by Bradford protein assay. Equal amounts of protein (40 ug) 
were loaded on 10% SDS-PAGE gel embedded with 0.2 mg ml’ of myelin basic 
protein. After electrophoresis, SDS was removed by incubation with washing 
buffer (25 mM Tris, pH 7.5, 0.5 mM DTT, 5 mM NaF, 0.1 mM Na3VOg, 0.5 
mg ml‘ bovine serum albumin and 0.1% Triton X-100) with three buffer 
exchanges at 22 °C for 1.5 h. The gel was incubated with renaturation buffer (25 
mM Tris, pH 7.5, 0.5 mM DTT, 5 mM NaF and 0.1 mM Na;VO,) at 4 °C overnight 
with four buffer exchanges. After pre-incubation with 100 ml of kinase reaction 
buffer without ATP for 30 min, the gel was incubated with 30 ml of kinase reaction 
buffer (25 mM Tris, pH 7.5, 2 mM EGTA, 12 mM MgCh, 1 mM DTT, 0.1 mM 
Na3VO,, 200 nM ATP and 50 pCi [*’P]y-ATP) for 1.5 h. The gel was washed 
with solution containing 5% trichloroacetic acid (w/v) and 1% potassium 
pyrophosphate (w/v) four times for 2-3 h. Dried gel was exposed with phosphor 
screen followed by phospho-imager analysis. 

Transient interaction assays and analysis of bikinin effects on YDA in trans- 
genic plants. Agrobacterium GV3101 strains transformed with 35S::CA-YDA- 
4Myc-6His or 35S::YDA-4Myc-6His constructs were alone or co-infiltrated with 
35S-BIN2-YFP expressing Agrobacterium into N. benthamiana leaves as described 
previously’. After 36 h, protein extracts were prepared from N. benthamiana 
leaves in immunoprecipitation buffer containing 50 mM Tris, pH 7.5, 150 mM 
NaCl, 5% Glycerol, 1% Triton X-100, 1 mM PMSF, 1 [iM E-64, 1 uM bestatin, 1 
LM pepstatin and 2 1M leupeptin. Supernatant obtained from 20,000g centrifu- 
gation was incubated with anti-YFP-antibody-bound protein A beads for 1 h. 
Beads were washed 5 times with immunoprecipitation buffer containing 0.2% 
Triton X-100. Immunoprecipitated proteins were eluted with 2x SDS Laemmli 
buffer, separated on SDS-PAGE and subjected to immunoblotting using anti-Myc 
antibody (Abcam) and anti-YFP antibody. 

For transgenic Arabidopsis plants, wild-type Arabidopsis was transformed with 
Agrobacterium containing 35S::YDA-4Myc-6His or 35S::BSL2-YFP construct by 
floral dip. Hygromycin or Basta-resistant T1 plants were screened by immunoblot 
using anti-Myc or anti-YFP antibody, respectively. 

To test the bikinin effect on YDA-Myc phosphorylation, homozygous YDA- 
4Myc plants were grown on half-strength MS medium containing 2 11M BRZ for 5 
days and treated with 30 1M bikinin or 2 4M BRZ solution for 30 min with gentle 
agitation. YDA-Myc was analysed by immunoblot. 
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Single-molecule imaging of DNA pairing by RecA 
reveals a three-dimensional homology search 


Anthony L. Forget’? & Stephen C. Kowalczykowski! 


DNA breaks can be repaired with high fidelity by homologous 
recombination. A ubiquitous protein that is essential for this 
DNA template-directed repair is RecA’. After resection of broken 
DNA to produce single-stranded DNA (ssDNA), RecA assembles 
on this ssDNA into a filament with the unique capacity to search 
and find DNA sequences in double-stranded DNA (dsDNA) that 
are homologous to the ssDNA. This homology search is vital to 
recombinational DNA repair, and results in homologous pairing 
and exchange of DNA strands. Homologous pairing involves DNA 
sequence-specific target location by the RecA-ssDNA complex. 
Despite decades of study, the mechanism of this enigmatic search 
process remains unknown. RecA is a DNA-dependent ATPase, but 
ATP hydrolysis is not required for DNA pairing and strand 
exchange””, eliminating active search processes. Using dual optical 
trapping to manipulate DNA, and single-molecule fluorescence 
microscopy to image DNA pairing, we demonstrate that both the 
three-dimensional conformational state of the dsDNA target and 
the length of the homologous RecA-ssDNA filament have import- 
ant roles in the homology search. We discovered that as the end-to- 
end distance of the target dsDNA molecule is increased, constrain- 
ing the available three-dimensional (3D) conformations of the 
molecule, the rate of homologous pairing decreases. Conversely, 
when the length of the ssDNA in the nucleoprotein filament is 
increased, homology is found faster. We propose a model for the 
DNA homology search process termed ‘intersegmental contact 
sampling’, in which the intrinsic multivalent nature of the RecA 
nucleoprotein filament is used to search DNA sequence space 
within 3D domains of DNA, exploiting multiple weak contacts 
to rapidly search for homology. Our findings highlight the import- 
ance of the 3D conformational dynamics of DNA, reveal a previ- 
ously unknown facet of the homology search, and provide insight 
into the mechanism of DNA target location by this member of a 
universal family of proteins. 

The mechanism by which the RecA family of DNA strand exchange 
proteins (which include T4 UvsX, archaeal RadA and eukaryotic 
Rad51) locate DNA sequence identity is unknown. Ensemble studies 
have constrained possible mechanisms by establishing that ATP 
hydrolysis is not needed** and 1D sliding is not operative’. Con- 
sequently, the manner by which the RecA nucleoprotein filament 
promotes the efficient, rapid and accurate search for homology has 
remained undefined for decades*. Single-molecule methods have the 
potential to provide new insight into this long-standing question. In 
fact, magnetic tweezer experiments showed that the endpoint of 
homologous pairing can be detected as a change in the length of a 
single dsDNA target molecule”*. However, the mechanism by which 
homology was found and DNA pairing occurred was not shown. 
Therefore, we sought to directly observe the manner by which RecA 
nucleoprotein filaments locate their homologous target in dsDNA. 

Initially we attempted to directly observe fluorescent RecA nucleo- 
protein filaments interacting with bacteriophage 1 dsDNA in real time 
by using total internal reflected fluorescence microscopy (TIRFM)’. 
Fully homologous fluorescent ssDNA that was complementary to 


three different loci of 1 DNA (Fig. 1A) was generated by incorpora- 
tion of 5-(3-aminoallyl) dUTP into ssDNA using polymerase chain 
reaction (PCR), followed with covalent attachment of ATTO565 (Sup- 
plementary Methods). RecA nucleoprotein filaments were assembled 
on these fluorescent ssDNA substrates in ensemble reactions contain- 
ing ssDNA-binding protein (SSB) and the non-hydrolysable ATP 
analogue, ATPyS (5’-O-3'-thiotriphosphate)*, ATPyS was used to 
maintain the filament in its active form, eliminate filament disassembly 
and prevent dissociation of DNA pairing products”’?”*. Using 
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Figure 1 | DNA pairing by RecA, imaged using single-molecule TIRFM, 
indicates that the three-dimensional conformation of target dsDNA is 
important in the homology search. A, DNA substrates. B, DNA pairing 
between A DNA (green) and RecA filament assembled on 430-nucleotide (nt) 
ssDNA (red). The ensemble reaction was examined by TIRFM (B, a). In in situ 
reactions dsDNA was attached before pairing; doubly attached extended DNA 
(B, b), singly attached DNA (B, c) and doubly attached DNA with ends in 
proximity (B, d). Homologously paired products were observed in B, c and 
B, d when DNA was relaxed by stopping flow and then extended by flow for 
visualization. Scale bars, 2.4 um. 
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biochemical assays, we confirmed that the fluorescent ssDNA that was 
generated by this procedure was functional for RecA-mediated DNA 
pairing (Supplementary Fig. 1). The 4 dsDNA, biotinylated at each 
end, was attached under flow to the interior surface of a single-channel 
microfluidic device (flowcell) (Fig. 1B). Owing to sequential attach- 
ment of each end to the streptavidin-coated surface, most DNA 
molecules were extended to nearly (~80%) B-form length, and exten- 
sion could be maintained in the absence of flow (Fig. 1B, a, b). 

To confirm DNA pairing at the homologous 4 DNA target site, 
reactions were conducted under ensemble conditions, and products 
were extended on the surface of a flowcell for analysis by single- 
molecule, two-colour TIRFM; dsDNA was imaged by YOYO1 binding 
(green) and ssDNA by ATTO565 (red). DNA pairing products were 
observed; the sites of interaction coincided with the region of 
homology within the 4 DNA molecule (Fig. 1B, a). For the 430- 
nucleotide ssDNA, all bound fluorescent ssDNA RecA filaments were 
at the homologous locus (observed fractional distance 0.51 + 0.02; 
n = 21; Supplementary Fig. 2). 

Next, we attempted to detect homologous pairing in real time using 
single-molecule TIRFM. Preformed RecA nucleoprotein filaments 
were introduced into a flowcell to which 2 DNA molecules were 
tethered, buffer flow was stopped, and the reaction was monitored 
in real time (Fig. 1B, b). Although the dsDNA was readily visible, we 
failed to observe any interaction between the fluorescent nucleoprotein 
filaments and extended X DNA, even for reaction periods longer than 
1h. However, we noticed that in addition to the desired doubly 
tethered extended } DNA molecules, some DNA molecules were 
attached only by one end (Fig. 1B, c). When flow was stopped to score 
pairing with the doubly tethered 24 DNA molecules, these singly 
tethered molecules relaxed to a randomly coiled state. Unexpectedly, 
when these unconstrained DNA molecules were subsequently re- 
extended by buffer flow, 80% (n= 20) revealed a stable pairing 
product (Fig. 1B, c). This finding suggested that either a free DNA 
end or a random coiled DNA was needed for pairing. In the same field 
of view, there were also A DNA molecules that had both ends attached, 
but at a relatively close end-to-end distance (Fig. 1B, d). When the flow 
was stopped, we observed that these molecules also participated in 
homologous pairing during the time that flow was off, demonstrating 
that a free DNA end was not required. These unanticipated results 
revealed that DNA pairing did not occur on DNA that was extended to 
near its entropic elastic limit, and suggested that the DNA homology 
search required the 3D states that are accessible in randomly coiled 
DNA. Collectively, they suggested that a coiled conformation of the 
target dsDNA is crucial. 

To address this possibility, we developed an alternative single- 
molecule imaging strategy that permitted reproducible measurement 
of the effects of dsDNA conformational structure, unperturbed by 
flow, on the DNA homology search process. This method uses a 
specialized flowcell (Fig. 2A), two optical laser traps operated in 
position-clamp mode, epifluorescent detection, fluorescent RecA- 
ssDNA filaments and a A DNA dumbbell (a single 1 DNA molecule 
with a 1-11m polystyrene bead attached at each end’’ (Supplementary 
Methods)). The DNA pairing assay was performed in situ using the 
dsDNA dumbbell target, and the dual optical trap configuration was 
used to reliably vary the end-to-end distance of the dsDNA. The 
flowcell has four channels and a flow-free reservoir. Movement of 
DNA dumbbells between channels of the flowcell was accomplished 
through stage translation, and manipulation of optical traps relative to 
one another was accomplished using a steering mirror controlling one 
of the traps. Each experiment (Fig. 2B and Supplementary Movie 1) 
consisted of six steps: first, in channel one, a streptavidin-coated bead 
was trapped in each of the two optical traps (Fig. 2B, a); second, the 
beads were moved to channel two to capture a A dsDNA molecule 
(biotinylated on both ends and stained with YOYO1) on one bead 
(Fig. 2B, b); third, the beads were moved into channel three, and by 
independent steering of a trap, the distal end of the DNA was attached 
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Figure 2 | Visualization of RecA-promoted DNA pairing with an individual 
optically trapped DNA dumbbell, imaged by epifluorescence. A, Four- 
channel flowcell with a flow-free reservoir. B, DNA dumbbell assembly and 
RecA-pairing reaction: two beads (yellow) are trapped (B, a); a 1 DNA 
molecule (green) is captured on one bead (B, b); the free DNA end is captured 
with the second bead using a steerable optical trap (B, c); the centre-to-centre 
bead distance is set and YOYO] is removed (B, d, de-stain); the DNA dumbbell 
is incubated in reservoir with RecA nucleoprotein filaments (red) (B, e) and 
DNA is extended to visualize products (B, f). C, Images of pairing products with 
430- and 1,762-nucleotide nucleoprotein filaments. 


to the second bead (Fig. 2B, c); fourth, the DNA-dumbbell was moved 
to the dye-free channel for de-staining, and the end-to-end distance 
was fixed (Fig. 2B, d); fifth, the DNA-dumbbell was moved to the flow- 
free reservoir containing the fluorescent ssDNA-RecA filaments 
(Fig. 2B, e); and sixth, after a defined incubation time, the DNA 
dumbbell was moved back to channel four, which is free of 
nucleoprotein filaments, extended to its contour length (~16 um) 
and examined for DNA pairing products (Fig. 2B, f). 

Shown in Fig. 2C are representative products of reactions in which 
the DNA dumbbells were initially held at a centre-to-centre bead 
distance of 24m and incubated for 2min in the reservoir that 
contained RecA nucleoprotein filaments. For the two homologous 
ssDNA nucleoprotein filaments shown (430 nucleotides and 1,762 
nucleotides), the pairing is clearly at the homologous locus. For a 
2 min incubation with dsDNA at a bead-to-bead distance of 2 um and 
the 430-nucleotide substrate, 90% of the dsDNA molecules (n = 29) 
contained a nucleoprotein filament stably bound to the expected region 
of homology (Fig. 3a). To determine the effect of end-to-end distance 
(that is, 3D conformation) on the RecA-mediated DNA pairing reaction, 
the reactions were performed at increasing bead separations (Fig. 3a). 
As the bead distance was increased from 2 [1m to 8 jim, the efficiency of 
DNA pairing decreased to near zero, extrapolating to zero at ~9 jim; 
for comparison, in the TIRFM experiments in which no DNA pairing 
was detected in situ, the DNA end-to-end distance was ~ 13 |im. 

We compared the time course of homologous pairing for fixed 
centre-to-centre bead distances of 2 [1m and 6 jim (Fig. 3b) to deter- 
mine the effect of decreasing DNA conformational states on the rate of 
the reaction. For the 2\1m separation, the rate of DNA pairing 
increased with a half-time of ~30s and approached a yield of 100%. 
When the separation was increased to 6 jum, the rate slowed fourfold to 
a half-time of ~125s, but nevertheless approached a yield of 100% 
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(Fig. 3b). To establish the kinetic reaction order, we conducted single- 
molecule DNA pairing assays as a function of RecA nucleoprotein 
filament concentration (Supplementary Fig. 3). The reaction rate 
was independent of nucleoprotein filament concentration, showing 
that DNA pairing under these conditions is not diffusion limited, 
but that it is limited instead by a rate-determining unimolecular step 
as in the ensemble studies'*. However, the pairing rate was dependent 
on dsDNA conformation and therefore was not dependent on the 
sequence recognition step itself. 

To understand the nature of the complex that limits the rate of DNA 
pairing, we varied the length of RecA nucleoprotein filaments. Shown 
in Fig. 3c is a comparison of the time courses for 162-, 430- and 1,762- 
nucleotide nucleoprotein filaments. Increasing the ssDNA length 
approximately fourfold, from 430 to 1,762 nucleotides, increased the 
observed rate of pairing approximately 3.8-fold. However, when the 
length of the ssDNA was decreased to 162 nucleotides, we did not 
observe any stably bound homologously paired products after incuba- 
tions for 10 min at the closest bead-to-bead distance possible (2 um), 
despite this substrate being active in ensemble DNA pairing reactions 
(Supplementary Fig. 2). We conclude that the length of the RecA 
nucleoprotein filament is a crucial factor in the rate-limiting step of 
homologous pairing. 

In addition to the anticipated stable, homologously paired end pro- 
ducts, short-lived non-homologous interactions were observed 
(Fig. 4a). These events, which occurred outside of the homologous 
regions, were relatively unstable and dissociated during the movement 
of the molecule from the reservoir to the observation channel, during 
the separation of beads or after the 1 DNA molecule was extended 
(Supplementary Movie 2). These heterologous events lasted no more 
than a few tens of seconds and never persisted on a timescale of minutes. 
When the molecules from the 2-1m data set were analysed, 22% of the 
reactions with the 430-nucleotide ssDNA and 40% of reactions with the 
1,762-nucleotide ssDNA had these unstable heterologously paired 
intermediates (Fig. 4b), and for the 162-nucleotide ssDNA, only 1 
heterologously bound filament was seen out of 28 molecules. 

Some intermediates of the pairing process had a second filament 
bound non-specifically to spatially separated regions of the 1 DNA 
molecule. For such a heterologously bound nucleoprotein filament, 
when the relaxed DNA molecule was moved into the observation 
channel and the beads were separated for observation, the existence 
of a loop could be inferred from a sudden recoil of the homologously 
paired spot. As the beads were separated, the weaker of the two 
heterologous interactions was released, and there was a simultaneous 
movement (‘jump’) of the fluorescence at the homologous pairing 
locus (Fig. 4a and Supplementary Movie 3) resulting from the release 
of DNA that was constrained in the loop. Approximately 12% (n = 50) 
of the DNA dumbbells showed loop release events for the 430- 
nucleotide nucleoprotein filament and, consistent with expectations, 


length; 162 nucleotides (triangles; n = 5, 6, 4 and 2 at the times indicated), 
430 nucleotides (squares; same data as Fig. 3b; n = 10) and 1,762 nucleotides 
(circles; n = 10); error bars, s.e.m.; 2-um separation; respective rates: zero, 0.023 
(+ 0.002) s 1, and 0.086 (+ 0.026) s_!. 


when the length of the nucleoprotein filament was increased to 1,762 
nucleotides, the number of molecules with transient loop structures 
increased to 47% (n = 30) (Fig. 4c). 

Our results clearly establish that both the 3D conformation of 
dsDNA and the length of the nucleoprotein filament are important 
determinants of the rate for DNA homologous pairing. These findings 
lead us to propose a model termed ‘intersegmental contact sampling’ 
to describe the search for homology by a RecA nucleoprotein filament 
(Fig. 4d). One of the key features of the model is that the RecA nucleo- 
protein filament has a polyvalent interaction surface that is capable of 
binding simultaneously and non-specifically, but weakly, with non- 
contiguous segments of dsDNA. The second related feature of this 
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Figure 4 | RecA nucleoprotein filaments exhibit transient non-homologous 
interactions and loop-release events. a, Kymograph of DNA dumbbell during 
bead separation (Fig. 2B, f). Distance scale (top) and tick marks show positions 
of beads (green) and nucleoprotein filaments (red); illustration depicts 
dissociation of heterologously bound filament. b, c, Fraction of dsDNA 
dumbbells with non-homologously bound intermediates (b) and loop release 
events (c); 430-nucleotide (blue) and 1,762-nucleotide (green) filaments; 

n= 50 and 30, respectively. Error bars, s.e.m. d, Model for RecA homology 
search by intersegmental contact sampling; for simplicity, only two 
simultaneous points of interaction are depicted. 
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model is that 3D conformational entropy of the dsDNA greatly 
enhances the probability that DNA sequence homology will be found 
through iterated homology sampling, using multiple weak contacts, by 
this polyvalent filament. This model is compatible both with our key 
experimental findings, which we expect would apply to the search in 
the presence of ATP as well, and with the involvement of heterolo- 
gously bound intermediates that have been inferred from biochemical 
studies'*"°. Our data show that dsDNA extended to near contour 
length fails to produce homologously paired products. This obser- 
vation provides an explanation for the observation that the formation 
of stable DNA pairing products in single-molecule studies using mag- 
netic tweezers required negative plectonemic supercoils in the DNA 
target”®. By contrast, when a ssDNA-RecA filament was extended to 
near its contour length, homologous pairing with fully homologous 
coiled dsDNA occurred’, which is compatible with our finding that the 
coiled structure of dsDNA is essential to the homology search. Here we 
established that as the end-to-end distance of the dsDNA was 
decreased, allowing it to assume a more random coil-like 3D con- 
formation, the rate of DNA pairing increased because the local DNA 
concentration increases, and the likelihood that DNA segments will be 
in close proximity also greatly increases. The increased local DNA 
concentration results in a greater statistical probability that a single 
nucleoprotein filament can simultaneously interact with and sample 
multiple regions of the same DNA molecule. This, in turn, is manifest 
as a kinetically more efficient homology sampling process. In further 
support of the intersegmental contact sampling model, when the 
length of the ssDNA in the nucleoprotein filament is increased, the 
observed rate of pairing, as well as the number of nucleoprotein 
filaments with multiple, transient, heterologous intersegmental inter- 
actions is increased. This shows that longer nucleoprotein filaments 
can simultaneously and independently sample more segments of the 
target dsDNA than shorter nucleoprotein filaments. Kinetically, our 
findings are consistent with the following two-step scheme: 


NPF+dsDNA atte NPF—dsDNA si NPF—dsDNA 
(heterologously bound) (homologously paired) 
where Kye, is the equilibrium constant for the binding of a RecA 
nucleoprotein filament (NPF) to heterologous dsDNA (the kinetic 
steps comprising Kj, are rapid compared to k,) and k, is the rate- 
limiting unimolecular rate constant for intersegmental homology 
searching step within the dsDNA molecule or domain. In general, this 
kinetic formalism predicts a hyperbolic dependence of homologous 
pairing on the component concentrations unless the equilibrium con- 
stant for formation of the heterologous complex is large; when this is 
case, the observed rate is defined by the first-order rate constant, k,. 
Given that the rate of target location is independent of nucleoprotein 
filament concentration, this implies that the heterologously bound 
complex is saturated at a filament concentration of 100pM (Sup- 
plementary Fig. 3), placing a limit on the apparent equilibrium 
dissociation constant of <10pM (that is, Knet > 10'' M_'). In the 
context of this kinetic model, values for k, are defined by the experi- 
ments in Fig. 3b, c, which show that the rate of the intersegmental 
homology search decreases fourfold when the DNA end-to-end 
distance increases from 1 {im to 5 1m and increases approximately 
fourfold when the ssDNA length increases approximately fourfold. 
The correlation of rate with the length of ssDNA suggests that the 
intradomainal search is enhanced proportionately by the increase in 
either heterologous contacts or the reach of the longer ssDNA. In 
many regards, the homology search by RecA has parallels to target 
location by sequence-specific DNA-binding proteins, with the notable 
exception that the specificity of the RecA filament is determined by the 
sequence of the associated ssDNA. Seminal work on the DNA target 
selection by transcriptional regulatory proteins identified sliding, 
hopping and intersegmental transfer as potentially facilitating 
mechanisms'”'*. Here we have established intersegmental transfer as 
the operative pathway used by RecA to find DNA sequence homology; 


426 | NATURE | VOL 482 | 16 FEBRUARY 2012 


this behaviour is distinct from the sliding and hopping used to enhance 
the rate of target location by most regulatory proteins, which are 
typically univalent or bivalent with regard to site binding’*. Our 
approach now provides a framework for future studies on the previ- 
ously mysterious homology search by recombination proteins. It is 
applicable to studies of more complex systems such as eukaryotic 
Rad51, as it can provide insight into the function of the many accessory 
proteins that enhance DNA pairing”. Finally, the imaging strategy and 
flow-free cell design can easily be adapted to visualize target location 
and mechanism of processes as diverse as DNA replication and repair, 
RNA interference, transcription and protein translation, in which the 
3D conformations of nucleic acids are undoubtedly important. 


METHODS SUMMARY 


RecA and SSB were purified as described’’”’. Fluorescent ssDNA was prepared as 
detailed in the Supplementary Information. Nucleoprotein filaments were formed 
as described’ in SM buffer (25 mM Tris acetate (Tris-OAc) (pH 7.5), 1 mM DTT 
and 4mM Mg(OAc)2), SSB (at a ratio of 1 SSB monomer to 11 nucleotides), 2nM 
molecules fluorescent ssDNA, and 1mM ATPYS were incubated for 10 min at 
37°C; RecA was added at 1 monomer per 1.7 nucleotides, and incubated 1h. 
Nucleoprotein filaments were diluted to 0.2 nM before use. 

For DNA pairing using TIRFM, biotinylated 2 DNA (1 pM, molecules) in SM2 
(SM with 50 mM NaCl) was bound to the flowcell and then washed to remove free 
DNA, and to attach the second DNA end. Reactions were started by addition of 
0.2 nM nucleoprotein filaments. For ensemble experiments visualized by TIRFM, 
nucleoprotein filaments and ) DNA were incubated for 1 h (162-nucleotide sub- 
strate) or 30 min (430-nucleotide substrate) at 37 °C. 

Visualization of RecA-mediated pairing with individual DNA dumbbells was 
performed at 37 °C. The flowcell was treated for 1 h with BSA (1 mg ml | ') in SM3. 
(50mM Tris-OAc (pH 8.2), 50mM DTT, 1mM Mg(OAc), and 15% sucrose). 
Biotinylated 4 DNA and buffers were pumped into the flowcell at a linear flow rate 
of ~100 ums” '. Channels contained $M3, 18 fM streptavidin-coated polystyrene 
beads (1 jtm, Bangs Laboratories) and 5 nM YOYO-1 (Invitrogen) (Fig. 2B, a); 
SM3, 100nM YOYO1 and 10 pM (molecules) biotinylated 1, DNA (Fig. 2B, b); 
SM3 (Fig. 2B, c); SM and 15% sucrose (Fig. 2B, d, f). The reaction reservoir 
contained 0.2 nM nucleoprotein filaments in SM with 15% sucrose and 0.5 mM 
ATPYS (Fig. 2B, e). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Microscope. The instrument that was developed was based on an Eclipse TE2000- 
U inverted microscope with a total internal reflected fluorescence (TIRF) attach- 
ment (Nikon) using a CFI Plan Apo TIRF 100, 1.45 numerical aperture 
oil-immersed objective. Infrared laser trapping, operated in position-clamp mode, 
was achieved almost exactly as previously described”' with the addition ofa polarizer 
(Newport) to split the beam and generate two traps, and a steering mirror 
(Newport) to control the x-y position of one of the beams. Fluorescence of the 
sample in TIRF mode was achieved by excitation using a Cyan 488-nm laser 
(Picarro) or a 561-nm laser (Cobolt). Epifluorescence illumination was achieved 
with an X-Cite 120-W mercury vapour lamp (Lumen Dynamics). The fluorescence 
emission was directed through a polychroic mirror (centre wavelength 515 nm, 
bandwidth 30 nm; and centre wavelength 600, bandwidth 40 nm; Chroma). Light 
was guided into a Dual-View apparatus (Optical Insights) where the green and red 
components were spatially separated (dichroic 565dxcr, emission HQ515/30 nm 
and HQ600/40 nm, Chroma). Movies were captured on a DU-897E iXon CCD 
camera (Andor, 100-ms exposure) and processed using IQ imaging software 
(Andor). 

Biotinylated 2 duplex DNA. Multiple biotin moieties were incorporated into 
both ends of bacteriophage 1 DNA (NEB) by an end-filling reaction. A 30-1 
reaction contained 1X NEB buffer number 2, 33 WM each of dATP, dTTP, 
dCTP and biotin-11-dGTP (Perkin Elmer), 5 ug 2 DNA and 5 units of Klenow 
exo (NEB). The reaction was incubated for 15 min at 25 °C then terminated by 
the addition EDTA to a final concentration of 10 mM and heat inactivation of 
Klenow at 75 °C for 20 min. The reaction was then diluted to a 100-1! final volume 
with Nanopure water (Millipore) and passed through an S-400 spin column (GE 
Healthcare) equilibrated with TE buffer (10 mM Tris-HCl (pH 7.5) and 1mM 
EDTA). 

Fluorescent ssDNA substrates. DNA primer sequences that were used to amplify 
defined regions of 4 DNA by PCR are the following, for: an 87-bp product for 
D-loop assay with pUC19 supercoiled DNA: forward primer 5’-biotin- 
CGACGGCCAGTGAATTCCCCGA-3’, reverse primer 5'-TTACGCCAAGCTT 
ACTCGGGAAACAT-3’; a 162-bp product (identical to’ DNA between base pairs 
12,368-12,529): forward primer 5'-biotin- TAACGTCATGTCAGAGCAGAAAA 
AG-3’, reverse primer 5’-GCAATACCATCAAAGGTCTGCGTG-3’; a 430-bp 
product (identical to 1, DNA between base pairs 23,788-24,217): forward primer 
5'-biotin-ACTGTTCTTGCGGTTTGGAGG-3’, reverse primer 5’-CTATCGGA 
AGTTCACCAGCCAG-3’; and a 1,762-bp product (identical to 1, DNA between 
base pairs 13,767-15,528): forward primer 5’ -biotin-GGATGCGGTGAACTTCGT 
CAAC-3’, reverse primer 5'-CCCCTTACTGCTTCCTTTACCC-3’. 

PCR reactions contained 1 ThermoPol buffer (NEB), 0.2 mM dATP, 0.2 mM 
dCTP, 0.2mM dGTP, 0.1mM dTTP, 0.2mM_ 5-(3-aminoallyl) dUTP 
(Fermentas), 0.25 ng ul! 2 DNA (NEB) (pUC19 for a 87-nucleotide substrate), 
0.5 .M each primer and 0.05 U jl’ Vent exo” polymerase (NEB). The thermo- 
cycler (iCycler, Bio-Rad) program involved initial denaturation at 95 °C for 2 min, 
30 cycles of a denaturation phase at 95 °C for 30s, an annealing phase at 60.6, 63, 
62.2 or 59.4°C for 30s, for 87-, 162-, 430- or 1762-nucleotide products, respec- 
tively, and an extension phase at 72 °C for 0.25, 0.25, 1 and 5 min for 87-, 162-, 430- 
and 1762-nucleotide products, respectively. The final PCR step was extension at 
72°C for 5 min. The reactions were then processed with a QlAquick PCR puri- 
fication kit (Qiagen). Following purification, the DNA was ethanol-precipitated at 
—20°C. To fluorescently label the PCR products, a 20-ll reaction containing 10- 
20 ug of PCR-generated DNA containing amine-modified nucleotides, 200 mM 
sodium bicarbonate (pH9.0) and 5mM ATTO565 NHS-ester (ATTO-TEC 
GmbH) was incubated for 1-2h at 25°C while protected from light. Alexa 
Fluor 488 succinimidyl ester (Invitrogen) was used to label the 87-nucleotide 
substrate used in the D-loop assay. Following incubation, 180 jl Nanopure water 
was added and a QIAquick PCR purification kit (Qiagen) was used to remove free 
label. Purified labelled DNA was stored at 4°C until the strand-separation step. 
Alkali denaturation in combination with the single 5’-biotin incorporated from the 
forward primer in the PCR reaction was used to produce ssDNA from the fluor- 
escently labelled duplex PCR product as follows: 800 ,1l avidin-agarose (400 kil 
settled gel; Thermo Scientific) was prepared in a 1.5-ml Eppendorf tube using 
centrifugation to pellet agarose. All centrifugation steps were performed using a 
bench-top centrifuge at 4,524g for 1 min. The resin was pelleted and washed three 
times with 1 ml binding and wash buffer (10 mM Tris-HCl (pH 7.5), 1 mM EDTA 
and 150 mM NaC)). Fluorescently labelled biotinylated dsDNA (~ 10-20 ug, from 
the PCR reaction above) was diluted to 1 ml with binding and wash buffer. The 
diluted DNA was added to the prepared avidin-agarose, and mixed end-over-end 
for 1h while protected from light. The agarose and bound DNA were pelleted by 
centrifugation and washed three times with 1 ml binding and wash buffer to 
remove unbound DNA. The ssDNA was eluted by alkali denaturation of the 
dsDNA, by addition of 200 pl of 0.15 M NaOH to the pelleted agarose and mixing 


end-over-end for 10 min to release the non-biotinylated strand. The slurry was 
transferred to an empty micro-spin column (Bio-Rad) and centrifuged at 4,700g to 
recover the eluted ssDNA. A Microspin S-400 column (GE Healthcare) was used 
to exchange the ssDNA into the TE buffer. Samples of each fraction were analysed 
by polyacrylamide or agarose gel electrophoresis. Fractions containing ssDNA 
were pooled, purified and concentrated with QIAquick PCR purification kit 
(Qiagen). The DNA concentration was determined using an extinction coefficient 
of 8,919 M_' cm | at 260nm, taking into account a correction factor of 0.34 for 
absorbance at 260 nm by the dye. The dye concentration was determined using an 
extinction coefficient of 120,000 M~! cm™! at 563 nm. 

Flowcell fabrication. Channels and holes were etched by CO, laser into glass 
slides (Fisher Scientific 25 * 75 X 1 mm) covered with an adhesive abrasive blast- 
ing mask (Epilog) using a 30 W Mini-24 Laser Engraver (Epilog Lasers). Following 
the engraving step, the slides were blasted using 220 grit silicon carbide (Electro 
Abrasives) to remove residual laser-ablated glass from the channels. A cover glass 
(Corning No. 1, 24 60mm) was attached with ultraviolet Optical adhesive 
number 74 (Norland Products) applied through capillary action. The adhesive 
was cured by placing the flowcell 30cm from a 100 W HBO lamp (Zeiss) for 
20 min followed by a final heat curing at 70 °C for 12 h. PEEK tubing with 0.5 mm 
inner diameter (Upchurch Scientific) was inserted into each of the etched holes to 
create inlet and outlet connection ports using 5 min Epoxy (Devcon). 

Surface preparation of single-channel flowcell for TIRFM experiments. The 
surface modification procedure was done at 25°C. The flowcells were cleaned 
with 1M NaOH for 30-60 min, and washed twice with 1 ml Nanopure water 
and then with 1 ml of buffer (25mM Tris-OAc (pH7.5), 50mM NaCl). 1 mg 
ml ' biotinylated BSA (Thermo Scientific) in the above buffer was then incubated 
in the flowcell for 5 min and then washed with 1 ml of buffer. After this, 0.1 mg 
ml! streptavidin (Promega) in buffer was incubated in the flowcell for 5 min then 
washed with 1 ml of buffer. Finally, the flowcell was blocked with 1.5mg ml”? 
Roche Blocking Reagent (Roche) in buffer for 30-60 min and washed with 1 ml 
buffer. The prepared flowcell was then mounted on the microscope and attached 
to the syringe pump (KD Scientific). 

D-loop assay. RecA and SSB were purified as previously described'’”*. The 
AlexaFluor 488-labelled 87-nucleotide ssDNA substrate was prepared as described 
above. A 10-l reaction containing 25mM Tris-HCl (pH7.5), 10mM MgCh, 
1mM DTT, 2mM ATPYS, 100 pg ml! BSA, 4.5 pM RecA and 105 nM fluores- 
cently labelled 87-nucleotide ssDNA was incubated for 8min at 37°C. The 
reaction was started with the addition of 35nM supercoiled DNA (pUC19) and 
incubated at 37°C for 20 min. The reaction was stopped by mixing with 5 pl of 
stop solution (4.8% SDS, 7 mg ml ' proteinase K) and incubating for 10 min at 
37 °C. Products were resolved by electrophoresis in a 1% ultrapure agarose gel 
(Invitrogen) using TAE (40 mM Tris, 20mM acetic acid and 1mM EDTA) at 
100 V until the bromophenol blue had migrated 4cm. The gel was imaged and 
analysed with a STORM scanner and Image Quant software (Molecular 
Dynamics). The efficiency of the reaction was calculated as the fraction of 
ssDNA that formed D-loops multiplied by three to correct for the threefold molar 
excess of ssDNA relative to supercoiled pUC19 in the reaction. 

Single-molecule DNA pairing experiments. Nucleoprotein filaments were 
formed essentially as described previously* in SM buffer (25mM Tris-OAc 
(pH7.5), 1mM DTT and 4mM Mg(OAc),); SSB (at a ratio of 1 SSB monomer 
to 11 nucleotides), 2nM molecules fluorescent ssDNA and 1mM ATPYS were 
incubated for 10 min at 37°C. RecA was added at a ratio of 1 monomer to 1.7 
nucleotides and incubated for 1h. Nucleoprotein filaments were then diluted 
tenfold to a final concentration of 0.2nM in buffer before introduction into the 
flowcell. In the DNA pairing experiments using TIRFM, biotinylated 1 DNA 
(1 pM, molecules) in SM2 buffer (SM and 50mM NaCl) was introduced into 
the flowcell and allowed to bind for several minutes. The flowcell was then washed 
with 500 tl SM2 buffer to remove free DNA as well as to extend and attach the 
second end of the A DNA molecules. The reaction was started by the addition of 
0.2nM nucleoprotein filaments in SM2 buffer. For ensemble experiments 
visualized by TIRFM, the nucleoprotein filaments and 1 DNA were incubated 
for 1h (162-nucleotide substrate) or 30 min (430 nucleotide substrate) at 37 °C 
before visualization in a single-channel flowcell. 

Visualization of RecA-mediated pairing with individual DNA dumbbells was 
performed at 37 °C. The flowcell surface was treated for 1 h with BSA (1 mg ml‘) 
in single-molecule (SM3) buffer (50 mM Tris-OAc (pH 8.2), 50 mM DTT, 1mM 
Mg(OAc), and 15% sucrose). Biotinylated 1 DNA and buffers were pumped at a 
linear flow rate of ~100 ums! into the flowcell. The channels contained SM3 
buffer, 18 fM streptavidin-coated polystyrene beads (1 1m; Bangs Laboratories) 
and 5 nM YOYO-1 (Invitrogen) (Fig. 2B, a); SM3 buffer, 100nM YOYO1, and 
10 pM (molecules) biotinylated 2 DNA (Fig. 2B, b); SM3 buffer (Fig. 2B, c); SM 
buffer and 15% sucrose (Fig. 2B, d, f). The reaction reservoir contained 0.2 nM 
nucleoprotein filaments in SM with 15% sucrose and 0.5 mM ATPYS (Fig. 2B, e). 
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Data analysis. Data were analysed using GraphPad Prism v5.04. The kinetic data 
were fit to a single exponential function (Y = Yo + (Plateau —Y9)(1 e )), 
In Fig. 4b, c, the time courses do not pass through the origin. We are not certain 
whether this is an intrinsic characteristic of the homology search or a limitation 
of the experimental procedure: for example, the time for the DNA to relax from 
flow-induced stretching after movement into the reservoir. We note that the half- 
time for the relaxation of extended 1 DNA is ~6s (ref. 22); during this time the 
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dsDNA is not in its equilibrium coiled configuration and initial interaction with 
the RecA nucleoprotein filament would be limited by the DNA polymer dynamics. 


21. Bianco, P. R. et a/. Processive translocation and DNA unwinding by individual 
RecBCD enzyme molecules. Nature 409, 374-378 (2001). 

22. Perkins, T. T., Quake, S. R., Smith, D. E. & Chu, S. Relaxation of a single DNA 
molecule observed by optical microscopy. Science 264, 822-826 (1994). 
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Got to get a grant 


A great idea will get applicants only so far. But there are 
other strategies that can add to the chances of success. 


BY KAREN KAPLAN 


ith his primary grant coming to an 
end, neuroscientist Thomas Mrsic- 
Flogel was more than a little stressed. 


He had launched his lab at University College 
London (UCL) with a career-development 
fellowship from the Wellcome Trust in Lon- 
don, but it was set to expire by mid-2011. In 
2010, with a worldwide recession in full swing, 
Mrsic-Flogel knew that he was hardly guaran- 
teed to land a new grant. 

He decided to apply for another Wellcome 


fellowship, proposing a project on how neuronal 
networks process visual stimuli. Applications 
had a discouraging success rate of about 20%, 
but the grant could be renewed every five years, 
which Mrsic-Flogel found attractive. He won 
the award — a £1.7-million (US$2.7-million) 
senior research fellowship, which pays his sal- 
ary and lets him purchase lab equipment and 
support a couple of graduate research associates. 

Mrsic-Flogel attributes his success to more 
than luck. He followed the application guide- 
lines to the letter, making sure that his proposal 
was both high-impact and innovative. He spent 


a year preparing it, including developing his 
idea and gathering preliminary data. And he 
sought input from dozens of people, from UCL 
grant advisers to colleagues in neuroscience 
and other fields, in effect creating an informal 
peer-review panel. He revised the document 
several times, once deleting an entire section, 
and when something stumped him, Mrsic- 
Flogel called grant recipients he knew to find 
out how they had dealt with similar problems. 

In the current funding environment, the 
odds of winning a grant or fellowship are very 
slim. But Mrsic-Flogel’s success demonstrates 
some helpful strategies and guidelines — artic- 
ulating an original idea, seeking feedback from 
multiple sources and writing concisely — for 
putting together a winning proposal. 


EXCELLENT SCIENCE 

Before all else, applicants must make sure 
that they are presenting excellent and origi- 
nal science, say grant programme officers and 
successful applicants. “You should be pro- 
posing a novel kind of research — not just > 
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> continuing some standard research youre 
already doing,” says Jochen Wosnitza, director 
of the Dresden High Magnetic Field Labora- 
tory in Germany and chairman of the review 
board for the German Research Foundation 
(DFG) in Bonn, the country’s main grant-giv- 
ing agency. To make sure that their projects 
are innovative, applicants should bounce ideas 
off colleagues and painstakingly comb through 
the literature. 

Grant officers are generally looking for 
work that could have an enduring influence. 
It should inherently lead to further study, 
although not necessarily to immediate appli- 
cations. “Have you thought about what hap- 
pens next?” asks David Crosby, programme 
manager for the UK Medical Research Coun- 
cil (MRC) in London and Swindon. Research 
funded by the MRC doesn’t have to lead to a 
disease cure in three years, “but you do need 
to think about the implications of your work,” 
he says. “If you're generating a fundamental 
insight, what is the consequence of that? How 
does that help the whole field? How might it 
go on to be utilized? How will it impact the 
science community and the public at large?” 
Researchers applying for a grant from a mul- 
tinational organization, such as the European 
Commission's Marie Curie Actions, or funding 
from the European Molecular Biology Organi- 
zation (EMBO) in Heidelberg, Germany, will 
also need to explain how their proposal would 
have benefits beyond their own country. 

Early-career researchers should keep in 
mind that many granting agencies frown on 
proposals linked to or 
associated with work 
done by the appli- 
cant’s mentor. “You 
have to show that 
you're an independ- 
ent-thinking scien- 
tist taking a different 
track from your for- 
mer supervisor, 


says Gerlind Wallon, 

deputy director of 

EMBO and manager 

of the Saunier. “Bend over 

Young Investigators backw ards to 

programme. give us what we 
There are no want.” 


hard and fast rules} Maryrose Franko 


on which funder to 

approach, say granting and funding agen- 
cies. Colleagues with their own grants can 
offer advice; early-career scientists applying 
to the US National Institutes of Health (NIH), 
for example, can get the names of successful 
grantees from NIH RePORTER (go.nature. 
com/32v6n5). It can also be extremely helpful 
to speak directly to the funder; however, pro- 
gramme managers recommend that applicants 
first learn the agency’s remit by closely read- 
ing its website and grant materials. “Absolutely 
come to us,” says Crosby. “Phone up the funder 


and say, ‘I’ve got this idea that I think pertains 
to your strategic interest. You've got a highlight 
notice on your website that says ‘systems biol- 
ogy — what do you mean by that? Does my 
idea fit into that bracket?” 

Crosby points out that a researcher's institu- 
tion may also have a preference; for example, 
the MRC and Wellcome Trust both fund bio- 
medical proposals, but the MRC pays some 
indirect costs and overheads to the institution 
that other funders don't, and so might be more 
attractive. 


NUTS AND BOLTS 

Applicants must effectively outline their ideas 
in the application, including a clear and direct 
hypothesis along with the expected results. 
Programme managers say that an application 
for funding to ‘explore a cell receptor’s signal- 
ling mechanisms; for example, is unlikely to 
be successful because it sounds vague and 
doesn't seek to prove anything. But a proposal 
to confirm that a particular protein is involved 
in acellular reaction, for example — one that 
includes preliminary results and explains the 
potential impact of the discovery — would 
have a far better chance. 

Some applications call for both a summary, 
aimed at reviewers who are not in the relevant 
field, and an abstract, for those who are. Most 
also havea section for a research plan, in which 
applicants can explain technical details. How- 
ever, reviewers who see an application for 
the first (and perhaps only) time in a review- 
panel meeting usually turn immediately to 
the summary, say grant officers. That is where 
applicants should persuasively and succinctly 
explain exactly why their proposal deserves 
funding. “It’s important to be able to clearly 
articulate your ideas; says Crosby. “If you can't 
do that, you're not going to be able to inspire 
enthusiasm.” Some funders also call for a pro- 
ject description or narrative, but veteran grant- 
writers say that if there is a choice, it is best to 
make the strongest case in the summary. 

Focus is key. If the summary is too techni- 
cal or rambling, the application's score will 
suffer, even if the idea itself is brilliant. “A bad 
summary is really disastrous,” says Andrea 
Hutterer, programme manager for EMBO 
fellowships. “It sets the tone for howI read the 
rest of the application” 

Applicants must state their research objec- 
tive clearly and straight away. “The first sen- 
tence should begin, “The research objective of 
this proposal is ...,’” says George Hazelrigg, 
a programme officer for design and integra- 
tion engineering at the US National Science 
Foundation. “Every inch from the top that 
I have to go down in the proposal to find 
this sentence lowers the rating by about one 
percentage point” 

It is wise to get editing and streamlining rec- 
ommendations from as many senior colleagues 
as possible, both in and outside the research 
field, and to check the funder’s website for 
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advice. In a mock application on the NIH 
website, the ‘before’ summary, meant to dem- 
onstrate pitfalls, is long, rambling and technical 
(“G-protein over-activation triggers a bio- 
chemical signaling cascade that leads to b-AR 
desensitization and down-regulation ...”), 
and contains several acronyms. The corrected 
‘after’ summary is clear and direct: “Conges- 
tive heart failure is a 
common and lethal 
disease in the United 
States. Current med- 
ications ... improve 
survival in some, 
but not all, patients. 
... This research will 
enhance our under- 
standing of the cel- 
lular and molecular 
mechanisms under- 
lying sympathetic 


“It’s important neuron dysfunction 
to be able that may progress 
to clearly to heart disease, and 
articulate may identify a possi- 
your ideas. If ble novel pharmaceu- 
you can’t do tical target.” 

that, you’re Applicants should 
not going to be make sure to request 
able to inspire an appropriate 
enthusiasm.” amount of funding. 
David Crosby Too little and there 


won't be enough 
money to finish the project — and it is next 
to impossible, say grant officers, to get supple- 
mentary funding. Too much and reviewers are 
likely to question the applicant's competence. 
“It implies that you don’t know what you're 
doing and don’t have a realistic grasp of the 
project,” says Crosby. Applicants can get help 
with calculations from their department heads, 
senior supervisors and mentors. For the costs 
of supplies, such as lab mice, they can talk to 
the institutional research office. 


SWEAT THE SMALL STUFF 

Other fundamental requirements may sound 
mundane or even silly — but failing to adhere 
to them can derail an application (see ‘Grant- 
writing blunders’). Investigators should read 
and follow all application instructions care- 
fully: most stipulate length and format, includ- 
ing particular typefaces, fonts, font sizes and 
margins. It does not pay to deviate from these 
in the hope of cramming in more text or fig- 
ures, warn programme managers. 

“Bend over backwards to give us what we 
want, advises Maryrose Franko, senior pro- 
gramme officer for graduate science educa- 
tion at the Howard Hughes Medical Institute 
in Chevy Chase, Maryland. Reviewers don't 
want to sift through an application to find an 
investigator's most significant published work 
or squint to read the text, she says. “If we say 
12-point font and you give us 10, the reviewers 
don't even want to look at it” 


MRC 
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SET-ROUTES 


DOS AND DON'TS 


Grant-writing blunders 


@ Avoid being too ambitious — don’t 
propose a study that would take decades. 
Grant officers can tell when an applicant is 
overextending. 

@ Don’t use abbreviations, acronyms, 
jargon or highly technical language. 
Reviewers who aren’t familiar with your 
field will get annoyed and may think 

that you are trying to cover up for a lack 
of knowledge or ability to carry out the 
experiment. 

@ Don’t give short shrift to explaining why 
your proposal is important. Reviewers 


Proposals must be easy to read, agree 
Stephen Russell and David Morrison, co- 
founders of Grant Writers’ Seminars and 
Workshops, a consulting business in Los 
Olivos, California, that helps clients with 
applications. “Reviewers read grant appli- 
cations for only one reason — because they 
have to,” says Russell. To help them, he and 
Morrison recommend making margins 
wider than the minimum, using an easy-to- 
read typeface and font size such as 12-point 
Arial — or whatever is specified in the 
instructions — and adding spaces between 
paragraphs and sections. 

Spelling errors and poor grammar may not 
immediately dis- 
qualify an applica- 
tion, but they could 
lower the score, 
or at the very least 
give a bad impres- 
sion. “Bad English 
and typos are an 
annoyance factor 
that reviewers have 
to overcome,” says ~ 
Wallon. “If it’s done \ 
sloppily, I wouldn't 


recommend it.” “You have to 

But scientists show that you’re 
don’t necessarily an independent - 
needtohireacon- thinking 
sultantto make sure scientist taking 
that their applica- adifferent 
tion is letter-perfect, track.” 

Gerlind Wallon 


say programme 
managers. “Using a 
commercial consultant gives your applica- 
tion a tone that panel members will detect. 
We're looking for a contribution from the 
individual,” says Alex Martin Hobdey, head 
of the unit for starting grants at the European 
Research Council in Brussels. Consultant- 
assisted applications tend to sound too slick 
or smooth — itis more effective to get editing 
recommendations from colleagues. 


don’t already know. Explain the study’s 
impact, advances and potential. 

@ Make the application easy to read — 
don’t cram it with text, use too-small fonts 
or miniaturize any figures. 

© Get lots of colleagues from within 

and outside your field to review your 
application closely and provide written 
responses. 

@ Make sure that you’re asking for an 
appropriate sum. If you request too much 
or too little, reviewers will conclude that 
you don’t know what you’re doing. K.K. 


Submissions that are incomplete or past 
deadline are certain to be disqualified. 
Hutterer says that out of the 850 applica- 
tions to EMBO’s fellowship programme 
each year, some 150 are unfinished and thus 
immediately ineligible. And Dennis Abbott, 
a spokesman for the Marie Curie Actions 
programme, decries late submissions. “No 
matter how good your application is, it’s too 
late,” he says. “Deadlines are set for a reason.” 


SHADES OF EXCITEMENT 

Applicants need to communicate the pay- 
offs of the research straight away. Russell 
says that a common mistake is to write a 
title that could be reused for future renewal 
applications. For example, he says, ‘Studies 
of renal disease’ is accurate but generic. He 
suggests evoking a salient image or concept 
— something more like ‘Contribution of anti- 
idiotype antibodies to pathogenesis of acute 
glomerulonephritis: He warns applicants not 
to let snappiness obscure the content of the 
proposal — something like ‘Breakthrough 
treatment strategies to cure acute glomeru- 
lonephritis’ draws attention but is sensation- 
alistic and vague. 

It helps to be positive and enthusiastic in 
project summaries, abstracts and research 
questions — but to include a back-up plan. 
“You need to say that you expect that this 
approach will work; however, if it doesn’t, 
you will be prepared to do this and this,” says 
Morrison. “It’s all about asserting confidence 
in your ability to do this research, backed up 
by your fallback of alternative strategies.” 

Ultimately, once the mechanics are right, it 
boils down to convincing reviewers that the 
application deserves funding. “If you can't 
convey your excitement and the importance 
of your proposal and what you think your 
results will be,’ says Franko, “then you're not 
going to get good scores.” = 


Karen Kaplan is Nature’ assistant Careers 
editor. 
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UNITED STATES 


Charity supports science 


At least 10 of the top 50 US charitable 
donors of 2011 gave funds to support 
scientific research, according to 

the Philanthropy 50 report released 

on 6 February by The Chronicle of 
Philanthropy in Washington DC. The top 
50 donors gave a total of US$10.4 billion, 
up from $3.3 billion in 2010. The 
Chronicle speculates that the increase is 
due to some economic recovery and a 
perceived need for funds at universities. 
Donations included $70 million to the 
Allen Institute for Brain Science in 
Seattle, Washington, for neuroscience and 
genomics research; $59.2 million to the 
Ellison Medical Foundation in Bethesda, 
Maryland, for biomedical research; and 
$25 million to Yale University in New 
Haven, Connecticut, to launch an energy- 
research institute. 


CHILE 


Tax credit for research 


The Chilean government hopes that a tax 
incentive will boost investment in research 
and development (R&D), and create 

jobs. The scheme triples the maximum 

tax credit for research-investment costs; 
eliminates a 15% tax on gross sales, easing 
the financial burden for entrepreneurs and 
start-ups; and can offset costs related to 
securing intellectual-property rights. The 
law will come into effect this year. Pablo 
Longueira, Chile’s economics minister, 
expects companies in mining, forestry, 
energy, agriculture and aquaculture to 
expand their research. “We believe that 
many of the new PhDs that are currently 
being trained outside of the country will 
return to work for R&D projects under 
this new law,’ he says. 


ANIMAL HEALTH 
Allen school expanding 


Recruitment has begun at Washington 
State University’s Paul G. Allen School 
for Global Animal Health in Pullman, 
where a new research facility will open 

in May. By 2015, administrators hope to 
hire 13 researchers to detect emerging 
cross-species diseases, develop vaccines 
and work on transmission control, says 
director Guy Palmer. Hiring is supported 
by US$51 million in donations from 
Microsoft co-founder Paul Allen and 

the Bill & Melinda Gates Foundation in 
Seattle, Washington; another $14 million 
is earmarked for programmes including 
training students in East Africa. 
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SET-ROUTES 


DOS AND DON'TS 
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Uae SCIENCE FICTION 


PICNIC WITH ANTS 


BY MARK W. MOFFETT 


tape 18 for my postdoctoral grant 978- 

2023, The Geopolitical Significance of the 
Ants in Namibian Mud Wallows: An Uber- 
synthesis. It's day nine of my observations on 
the Pheidole ‘big-headed’ ant colony nesting 
east of Otjiwarongo in warthog splash pud- 
dle and latrine 620. 

07:00 71 ants passed by @ 
along their foraging trailinthe ~ 
last minute. Their diet remains 
unclear, though as before they 
transport bits of blackened 
material. } 

07:02 No change. t 

7:04 Nothin. 

7:06 Nada. 

7:08 Yawn. 

7:10 Zzz. 

7:12 Kill me. 

07:14 I count 69 ants this 
last minute. An occasional 
ant hauls a pebble to one of 
the rings of sand they’ve let 
accumulate near the trail 
border. (Note to self: this 
pointless activity makes me 
imagine that the ants are as 
bored as I. Cest la vie! For ulti- 
mately, I shall triumph with my 
unifying concept of post-Jurassic 
hypertrophy of ant neosocial meta- 
structure under conditions of intense 
swine excreta interactivity and its effects 
on antennal waving dynamics.) 

07:16 73 ants per minute. One ant stops, 
grooms itself. Fly lands on my nose. 

07:18 67 ants. (Note: trail usage remains 
stable, contrary to the conjectures of those 
bohemian Yale intellectuals with their fancy 
graph paper. At least this prediction from 
Appendix 142 of my grant will be borne out? 
Finishing my count, I give the grooming ant 
a cheery salute as it turns in my direction.) 

07:20 I slap the fly. The ant stops groom- 
ing, runs to the moribund bug. It pauses to 
look at me, then looks at the fly, then at me 
again, before scurrying off. 

07:22 Six ants gather around the fly, start a 
small fire under it within one of those rings 
of sand they had deposited earlier. (Note: 
fly wings crackle as they 


I ’m Gerry Blandsides and this is recording 


burn.) The team rotates NATURE.COM 
the fly so that it roasts _ Follow Futures on 
evenly, turning froma __ Facebookat: 
golden dipterous brown _ go.nature.com/mtoodm 


Come fly with me. 


to a deep smoky grey. (Another note: isn’t 
cooking unique to humans? Follow-up grant 
assured if this finding is reproducible.) 

07:24 The charbroiled scent is driving 
me crazy. The ants have taken herbs from 
nearby shrubs, adding the redolence of 
oregano, but with more vibrant undertones! 
I pull a sandwich from my field vest, but the 
baloney disappoints. 


07:26 Ants remove fly from fire, carve 
the smoked meat. (Note: salivating! And 
if I recall correctly, insects have less fat and 
more protein than steak.) 

07:28 The ant that had been grooming 
itself — and I’m certain it is the same indi- 
vidual — has returned with six others. (Note 
to self: as with other Pheidole ant species, 
this recruitment of assistance was doubt- 
less accomplished by use of chemical scent 
that the leader ant releases into the air. The 
authority on such pheromones is Professor 
E. O. Wilson — might he support a field- 
work project on this?) The newly arrived 
ant workers lug half the butchered fly in my 
direction, then they gaze up at me and back 
away slowly. 

Tastes delicious, I knew it would. 

[Long interval of static on recording tape.] 
...need a seductive marketing name, like 
the ones restaurants give ugly game fish. 
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Arthrofowl? Miniquail? Souperfly? Kosher- 
bug? Chicken Little? MicroMcNuggnats? 
Flying lobster? 

17:56 313 ants/minute. 87 little bonfires 
flicker at my feet in the last glow of sunset. 
A steady supply of swatted flies — now 
including crepuscular mosquitoes of the 
genus Anopheles — keeps my ant colony 
busy. Their recipe improves each time I 
squash an ant who overcooks. (Note: natu- 
ral selection in action? I should write a grant 
on this topic also. Then again, forget the sci- 
ence — boring, boring! Ants that cook, who 
cares? Future is assured if I replicate this 
— succulant? aphrodisiant? antbrosia? 
— recipe without them.) 

17:58 The 50/50 split that the ants 
are giving me seems fair. But still, 

I calculate I will require thou- 

sands of flies to maintain the 

diet. What must it be like for 
an elephant to depend on pea- 
nuts, handed out one at a time 
by children? So no more bug 
repellant for me! Let Bugs Come 
Hither. Anyway, DEET gives each 
morsel a decided bitterness — 
makes the ants queasy as well. 
Hold on. More ants are watch- 
ing me. It seems a fly has landed on my ear, 
and now there’s a mosquito on my forehead. 
Back in two minutes. 

[Another interval of indecipherable static. ] 

19:22 Too many ants are arriving to 
count — 4,000 a minute? No matter how 
many flies I swat, I can no longer keep up 
with their needs. Worse, the flames below 
me have merged into a single conflagration 
that is singeing my hair. My face is blister- 
ing; eyes water from the heat. (Note: Stupid 
ants, how can they cook anything now? Set- 
backs like this could cause delays if I decide 
to approach the Food Network. Wait, look at 
that! The ants swarm my legs, some of them 
carrying herbs. It’s a recruitment response 
a thousand times more intense than they 
show to a fly! Is a different pheromone 
involved? I will take copious notes, but first 
I must figure out why I can’t move.) 

[Recording ends.] m 


i 
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Mark W. Moffett is a Smithsonian 
entomologist who has won the Lowell 
Thomas Medal from the Explorers Club 

and the Bowdoin Medal for writing from 
Harvard. After completing a PhD under E. O. 
Wilson, he spent years watching ants for his 
recent book, Adventures Among Ants. 
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BRIEF COMMUNICATIONS ARISING 


Isotope fractionation in silicate melts 


ARISING FROM G. Dominguez, G. Wilkins & M. H. Thiemens Nature 473, 70-73 (2011) 


Experiments show that temperature gradients in silicate melts lead to 
isotope fractionation, where the heavier isotopes concentrate in cold 
regions and light isotopes concentrate in hot regions’ *. Dominguez 
et al.” present a phenomenological model based on quantum effects 
that provides a good fit to these experimental results, and argue that 
“consideration of the quantum mechanical zero-point energy of dif- 
fusing species is essential for understanding diffusion at the isotopic 
level”. However, we point out that the zero-point energy required to 
fit their model to experimental results is unphysically large, and that 
isotopic fractionation similar to that observed in silicate melts is found 
in systems where quantum effects are absent. Therefore, the conclu- 
sion that quantum effects underlie isotope fractionation in silicate 
melts with temperature gradients is not justified. 

To fit experimental data, the Dominguez et al.° model requires a 
zero-point energy (ZPE) for *°Mg of ~0.4eV. The atomic motion 
giving rise to the ZPE is vibrational, and can be modelled by a har- 
monic oscillator for which ZPE = (1/2)hv, where h is Planck’s con- 
stant and v is the vibrational frequency. (Here for convenience we 
consider y= v/c, where c is the velocity of light.) The value 
ZPE ~ 0.4eV corresponds to ¥ ~ 6,500 cm !, which is much larger 
than the highest vibrational frequencies (~1,300 cm") observed in 
anhydrous silicate melts®. In fact, } ~ 6,500 cm ! is larger than the 
vibrational frequency in any material whatsoever (the highest vibra- 
tional frequency we are aware of is that for H», where 
¥ ~ 4,395 cm ')’. Thus a ZPE of ~0.4 eV is not physically relevant. 

The unphysically large ZPE in the model of Dominguez et al.* leads 
to predictions of relative diffusivities of isotopes that are in poor agree- 
ment with experiments. For example, their model (equations (11) and 
(12), and ZPE(*°Mg) = 0.4eV) predicts D(?*Mg)/D?°Mg) = 1.13 at 
1,500 K. In contrast, experiments on silicate melts find D(?*Mg)/ 
D(?°Mg) = 1.004 (ref. 2). Thus, the Dominguez et al. model predicts 
an isotope effect for relative diffusivities that is more than 30 times 
larger than found experimentally (13% versus 0.4%). 

Finally, we note that isotope fractionation in temperature gradients 
occurs in systems where quantum effects are not relevant; this implies 
that quantum effects are not a necessary condition for isotope fractiona- 
tion to occur (whereas they are a necessary condition in the Dominguez 
et al.” model). For example, significant fractionation of isotopes is seen in 
gases held in a temperature gradient*”°. In gases, quantum ZPE (arising 
from confinement) plays no role because molecules typically are far 
apart. Thermal fractionation of isotopes is also observed in molecular 
dynamics simulations of condensed phase systems! based on classical 
mechanics—these simulations ignore quantum effects, and in contrast 
to the model of Dominguez et al.° include no phenomenological con- 
siderations. In both of these cases, heavier isotopes concentrate in cold 


Dominguez et al. reply 


regions and light isotopes concentrate in hot regions, consistent with 
experimental observations on silicate melts and all other condensed 
phase systems that have been studied. This effect is understood 
theoretically in terms of classical mechanics", and quantitative agree- 
ment is obtained between this theory and experiment”. 
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REPLYING TO D. J. Lacks, J. A. Van Orman & C. E. Lesher Nature 482, http://dx.doi.org/10.1038/nature10764 (2012) 


Lacks et al.' argue that our model of isotopic fractionation in thermal 
gradients in silicate melts* does not agree with measurements of the 
ratio of diffusivities seen in silicate melts. This statement is based on 
an over-interpretation of our model into non-steady-state applica- 
tions, such as chemical fractionation, because the model we presented 


treats the quantized energy levels of the transition state as being equal 
to each other (the partition function Z(TS) = 1). This was warranted, 
as our main interest was in finding the steady-state solution to isotopic 
fractionation in a closed system (which is insensitive to the transition 
state). The potential importance of the transition state in determining 
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the ratios of diffusivities of He isotopes in a geologic system has 
previously been noted’, and future work will need to clarify the 
importance of the transition state for kinetic isotopic fractionation 
in silicate systems, particularly the ratio of diffusivities. 

Lacks et al.' point out that isotopic fractionation due to temperature 
gradients in the gas phase have been observed (for example, ref. 4), but 
they do not provide proof that classical mechanics quantitatively 
explains these observations. Furthermore, Lacks et al.' (and references 
therein) provide no evidence that molecular dynamics simulations 
reproduce the isotopic fractionations observed in real gas systems, 
much less high-temperature condensed phases. A full understanding 
of isotopic fractionation for complex (diatomic, polyatomic) species, 
even in the gas phase, is likely to require quantum mechanics because 
of the involvement of quantized vibrations’. 

No evidence is presented by Lacks et al.’ that molecular dynamics 
simulations, based on Lennard-Jones interactions, are capable of repro- 
ducing the isotopic fractionations of elements and of capturing the 
strong potential energy interactions that characterize silicate melts 
and other systems where diffusion is a strong function of temperature. 
Because of the magnitude of the activation energies involved, only a very 
small fraction (f) of all particles acquire enough energy to overcome 
activation energy barriers: f~ exp—(—E,/kgT). Here E, is activation 
energy, kp Boltzmann’s constant, and T temperature. For typical activa- 
tion energies of 2-3 eV, this fraction is rather small (~10 8), thus 
making it difficult for molecular dynamics simulations, which typically 
employ ~10° particles, to realistically capture the thermodynamics 
associated with isotopic fractionation in a highly interacting condensed 
matter system. We suggest caution in interpreting simple binary mix- 
ture molecular dynamics simulations that rely on ‘unphysical’ means of 
implementing heat transport’. 

Last, Lacks et al.' point out that the vibrational frequencies needed 
to explain the steady-state fractionations are physically unrealistic. 
We note that this frequency may or may not correspond to a physical 
frequency, as the initial reactant state consists of three independent 
vibrational components that we, for the sake of clarity and simplicity, 
incorporated into one effective frequency (thus the effective frequency 
would be higher than each of the independent components)’. The 
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infrared vibrational features that are observed in natural systems 
are sensitive only to differences in energy levels, not the zero-point 
energy. We also note that, as infrared vibrational features in a silicate 
melt are overwhelmingly dominated by modes associated with Si-O 
stretching and bending, it is unclear whether Raman or vibrational 
infrared spectra include features associated with diffusing interstitial 
species such as Mg, Fe and Ca. 

In short, Lacks et al.' have provided no new data or physical model 
that quantitatively explains the empirical observations of steady-state 
isotopic fractionation in silicate melts. Furthermore, over 50 years of 
work in isotope effects in physical chemistry and isotope geochemistry 
support the role of quantum mechanics in isotopic fractionation 
processes. 
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Abrupt acceleration of a ‘cold’ ultrarelativistic wind 


from the Crab pulsar 


F. A. Aharonian?, S. V. Bogovalov? & D. Khangulyan* 


Pulsars are thought to eject electron-positron winds that energize 
the surrounding environment, with the formation of a pulsar wind 
nebula’. The pulsar wind originates close to the light cylinder, the 
surface at which the pulsar co-rotation velocity equals the speed of 
light, and carries away much of the rotational energy lost by the 
pulsar. Initially the wind is dominated by electromagnetic energy 
(Poynting flux) but later this is converted to the kinetic energy of 
bulk motion’. It is unclear exactly where this takes place and to 
what speed the wind is accelerated. Although some preferred 
models imply a gradual acceleration over the entire distance from 
the magnetosphere to the point at which the wind terminates**, a 
rapid acceleration close to the light cylinder cannot be excluded*”. 
Here we report that the recent observations of pulsed, very high- 
energy y-ray emission from the Crab pulsar’° are explained by the 
presence of a cold (in the sense of the low energy of the electrons in 
the frame of the moving plasma) ultrarelativistic wind dominated 
by kinetic energy. The conversion of the Poynting flux to kinetic 
energy should take place abruptly in the narrow cylindrical zone of 
radius between 20 and 50 light-cylinder radii centred on the axis of 
rotation of the pulsar, and should accelerate the wind to a Lorentz 
factor of (0.5-1.0) X 10°. Although the ultrarelativistic nature of 
the wind does support the general model of pulsars, the require- 
ment of the very high acceleration of the wind in a narrow zone not 
far from the light cylinder challenges current models. 

The Crab pulsar is one of the brightest y-ray sources in the sky. Both 
the light curve and the energy spectrum have been studied”° in great 
detail by the Large Area Telescope on board NASA’s Fermi Gamma- 
ray Space Telescope (Fermi). The phase-averaged spectrum is best fitted 
by a power law with a photon index of « = 1.97 and an exponential cut- 
off at E, = 5.8 GeV (Fig. 1). Although modified ‘outer gap’ models"' do 
allow an extension of the spectrum up to 10GeV, the detection of 
pulsed, very high-energy (VHE) y-ray emission demands a different 
radiation component. The extrapolation of the fluxes reported by Fermi 
to the VHE domain as a power law with photon index « ~ 3.8, and the 
claim that such a formal fit is evidence that y-rays of gigaelectronvolt 
(GeV) energies have the same magnetospheric origin as those of 
teraelectronvolt (TeV) energies*””’, in fact requires a drastic revision 
of basic concepts used at present in magnetospheric models. Moreover, 
the assumption of a magnetospheric origin for radiation over the entire 
y-ray domain contradicts the essentially different light curves reported 
at GeV (ref. 10) and TeV (refs 7, 9) energies (unless the production sites 
of these two components are well separated), as well as the apparent 
tendency of spectral flattening above 100 GeV (Fig. 1). 

A natural and more plausible site of production of pulsed VHE 
y-rays is the ultrarelativistic wind illuminated by photons originating 
in the pulsar’s magnetosphere and/or the surface of the neutron star’’. 
In the case of the Crab pulsar, the phase-averaged flux of the pulsed 
(magnetospheric) component exceeds the flux of the thermal emission 
of the neutron star by two orders of magnitude. The combination of 
the hard spectral energy distribution of the pulsed emission and the 
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Figure 1 | Spectral energy distribution of y-ray radiation produced by the 
pulsar magnetosphere and by the pulsar wind. Symbols show the reported 
y-ray fluxes with 1-s.d. error bars’. Curves show theoretical predictions (this 
work). The Fermi Large Area Telescope points’® are best fitted by the function 
Fz =3.8X 10 3E°exp[—E/5.8 GeV]Jm~*s ! (dashed grey line). 
Assuming a slightly harder spectrum in the cut-off region, with 

Fp =3.8X 10 Sexpl—(E/7 GeV)°*] Jm~*s_! (solid grey line), the 
MAGIC ‘mono’ data points*® can be explained as well (because of large 
systematic uncertainties, the mono 100-GeV point, which differs by a factor of 
three from the flux measured by two MAGIC telescopes in the more reliable 
stereoscopic regime’, perhaps ought to be discarded). This spectrum is 
somewhat harder than that predicted by standard magnetospheric models, but 
does not challenge them'*'*. The inverse-Compton y-ray emission of the cold 
ultrarelativistic wind’’ can naturally explain the pulsed y-ray fluxes reported”? 
above 100 GeV. The solid light-blue, blue and green curves are calculated under 
the assumption of ‘instant’ acceleration of the wind at the fixed radius Ry. In 
principle, the acceleration can start earlier, but closer to the light cylinder the 
acceleration rate should be modest; otherwise it would lead to overproduction 
of inverse-Compton y-rays. Earlier acceleration is demonstrated by the dashed 
black curve, which is calculated under the assumption that acceleration starts at 
the light cylinder with a rate that increases in proportion with R® up to 

Ry = 30R;, where the Lorenz factor equals 5.5 X 10° (Supplementary 
Information). The solid red curve corresponds to the case in which the 
Poynting flux transformation takes place within the 20R,-50R, zone, assuming 
the wind’s acceleration rate to be independent of distance; the maximum 
Lorentz factor, achieved at 50R, is set to 10°. (The dotted grey line corresponds 
to the superposition of the red and solid grey lines and shows the transition 
between the two radiation components.) Because of the decrease in the density 
of target photons with distance, the main fraction of VHE radiation is produced 
at around 30R, with a Lorentz factor close to 5 X 10°. This explains the general 
similarity of the red curve to the instant-acceleration curves, apart from in the 
highest-energy region, where the sharp cut-off of the red curve is shifted to 
~500 GeV. 
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reduction of the Compton cross-section due to the Klein-Nishina 
effect means that the X-ray band is the main contributor to the 
Comptonization of the wind. The X-ray flux is well measured up to 
100 keV (ref. 14) and therefore the calculations of the inverse- 
Compton radiation depend basically on the site and the dynamics 
(speed) of transformation of the Poynting flux to kinetic energy of 
bulk motion. 

We assume that at a distance R,, from the pulsar, the wind is 
accelerated to the Lorentz factor I, (Fig. 2). Particles of the accelerated 
wind cannot move purely radially, because the wind should carry both 
the energy and the angular momentum lost by the pulsar. From the 
relation between the rotation energy (Eo) and angular momentum 
(M,o:) losses, Erot =@2Mrot, where Q is the angular velocity of the 
rotating sphere and a dot denotes a time derivative, we can define 
the trajectory of the wind particles. Indeed, each particle of the wind 
carries energy I” ‘mc’ and angular momentum J\mr , v, where m, r, 
and vare the particle’s mass, lever arm and speed, respectively, and c is 
the speed of light. Because I\ymr,vQ =I" ‘ymc’, particles in the 
accelerated wind move along straight lines, tangent to the light 
cylinder. Therefore, all photons emitted by the magnetosphere will 
collide with electrons of the wind at a non-zero angle, 0, resulting in 
inverse-Compton y-rays. The y-ray production efficiency depends on 
the electron Lorentz factor, the density of the target photons and the 
interaction angle. Because the cold wind carries almost the entire spin- 
down luminosity, even a tiny efficiency of about x ~ 10° © should be 
sufficient to produce detectable y-rays at an energy flux level of 
Fe= KE rot 40d? ~ 10-15 Jm~2s~—!, where d~ 6 X 10!’ m is the dis- 
tance to the Crab. 

Generally, the light curve of the target photons should be reflected in 
the time structure of the inverse-Compton y-ray signal; however, they 
cannot be identical, owing, for example, to the effects related to the 
specifics of the anisotropic inverse-Compton scattering. More impor- 
tantly, the geometrical effects may lead to non-negligible differences 
between the arrival times of the target photon and the secondary y-ray 
pulses (Fig. 3). For wind located close to the light cylinder, the y-ray 
signal seems shifted in time relative to the reported y-ray data, by 
At ~ 0.1T. By contrast, for wind acceleration at Ry = 30R,, the widths 
and the positions of the predicted and observed y-ray peaks (P1 and 
P2, respectively) are in very good agreement. However, whereas in 
the case of the isotropic wind the predicted P1/P2 flux ratio of the 
y-ray signal mimics the X-ray light curve’* (Fig. 3, black crosses), 
the reported y-ray data”? seem to correspond to a smaller ratio, 
P1/P2 < 1. This can be explained by there being a non-negligible wind 


anisotropy, which would introduce noticeable corrections to the shape 
of the y-ray light curve in general and to the P1/P2 ratio in particular 
(Fig. 3). The large uncertainties in the present y-ray data prevent us 
from a reaching a strong conclusion in this regard, but the improve- 
ment of the quality of VHE y-ray light curves should in future allow the 
strength and the character of the wind anisotropy to be decisively 
probed. 

GeV y-rays have a light curve’® that is essentially different from the 
reported VHE light curves”. This can be interpreted as a result of the 
production of GeV and TeV y-rays in regions well separated from each 
other. This conclusion is supported by the spectral energy distribution 
of the time-averaged GeV and TeV signals. As demonstrated in Fig. 1, 
the entire y-ray region can be considered a superposition of two sepa- 
rate components. Indeed, by introducing a new, flat-spectrum VHE 
component of the Comptonized wind, in addition to the nominal 
(magnetospheric) GeV component, the reported data in the GeV-to- 
TeV energy intervals can be smoothly matched. 

Although inverse-Compton y-rays are produced by mono-energetic 
electrons, the spectral energy distribution of y-rays in the range of tens 
to hundreds of GeV is quite flat. This is caused by the combination of 
effects related to the broad power-law distribution of seed photons and 
the transition of the Compton cross-section from the Thomson regime 
to the Klein-Nishina regime. On the other hand, the spectrum is 
expected to have a very sharp cut-off at E= I°,mc’. This not only 
can serve as a distinct feature for the identification of the wind origin 
of y-rays, but also should allow us to determine the Lorentz factor of 
the wind. In fact, the measurements available at present do not allow 
strong deviation of the Lorentz factor from 5 X 10°. We note that the 
calculations do not depend on the ‘magnetization parameter’ o (the 
ratio of the electromagnetic energy flux to the kinetic energy flux) as 
long as Ry>>R,. However, formally we can explain the pulsed VHE 
emission even for ¢ = 1. In this case, the acceleration should occur 
closer to the pulsar (Ry « 1/a"”) to compensate for the reduction in 
the wind’s kinetic energy. But in this case, the inverse-Compton y-ray 
radiation is expected to have quite different spectral and temporal 
features. 

The above estimates of the location of wind’s acceleration site and its 
Lorentz factor are quite robust, but they are obtained under the 
assumption that the transformation of the Poynting flux proceeds 
very quickly, at a specific radius between Ry, and Ry + dR, with 
OR,/Ry = 1. This is not an obvious assumption, but is instead a 
working hypothesis that the wind acceleration takes place in a narrow 
zone at the radius Ry ~ 30R,. We cannota priori exclude the possibility 
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Figure 2 | Complex comprising the pulsar 
magnetosphere, the ultrarelativistic wind and the 
pulsar wind nebula. Dense electron (e )-positron 
(e*) plasma produced in the pulsar magnetosphere 
by pair creation processes” initiates an electron- 
positron wind at the light cylinder, which has 
radius Ry ~ 10° m. Initially, the rotational energy 
lost by the pulsar, Ero: =5 x 10°! J s~!, is released 
mainly in the form of electromagnetic energy 
(Poynting flux) and the wind’s Lorentz factor 
therefore cannot be very large. Ata distance R,, the 
Poynting flux is converted to the kinetic energy of 
bulk motion (green zone), leading to an increase in 
the bulk-motion Lorentz factor to at least’® 

Ty ~ 10°. The termination of the wind by a 
standing reverse shock at Ry, ~ 3 X 101° m boosts 
the energy of the electrons to 10’* eV and 
randomizes their pitch angles’. The radiative 
cooling of these electrons through the synchrotron 
and inverse-Compton processes results in an 
extended non-thermal source*’*’, the Crab nebula. 
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Figure 3 | Formation of the pulsed VHE inverse-Compton y-ray signal in 
the wind of the Crab pulsar. a, Geometry of the inverse Compton scattering of 
magnetospheric X-rays by the electron—positron wind. b, Theoretical y-ray 
light curves of the wind presented together with the reported VHE data”. The 
velocity of the accelerated wind is tangential to the light cylinder (the direction 
of motion of electrons towards the observer is shown by the dashed green 
arrow). The interaction of electrons with the magnetospheric X-rays occurs 
predominantly at a distance R ~ R,,, where the wind is accelerated. Owing to 
the decrease in the target photon density with distance, the production of 
inverse-Compton y-rays is suppressed at larger distances. The target X-ray 
photon converted to a VHE y-ray photon reaches the observer earlier than an 
‘identical’ photon emitted directly towards the observer. Two factors contribute 
to the time shift, At: the up-scattered X-ray photon is emitted by the pulsar 
earlier, by a time 0T/2n, where T is the pulsar period; and it travels an additional 
path length of R,[1 — cos(@)]. For Ry>>R_, the time shift is negligibly small: 
At ~ —(T/4m)R,/R,,. For acceleration of the isotropic pulsar wind at 

R, = 30R,, the y-ray light curve (solid blue line) closely resembles the shape of 
the measured X-ray light curve’ (black crosses). For wind accelerated close to 
the light cylinder, the y-ray light curve is shifted and somewhat broadened by 
comparison with wind accelerated at Ry>>R,. The anisotropy of the wind can 
also strongly deform the y-ray light curve; in particular, it can change the ratio 
of the fluxes corresponding to peaks P1 and P2. The solid red line is calculated 
for an anisotropy factor proportional to the square of the sine of the angle 
between the line of sight and the direction of the magnetic momentum. This 
light curve seems to be in better agreement with the VERITAS’ and MAGIC’ 
points than the light curve corresponding to the fully isotropic wind, although 
the statistical and systematic uncertainties of observations (only Poisson error 
bars corresponding to the total count rates are shown on the plot) do not allowa 
definite conclusion in this regard. 


that the wind is gradually accelerated starting from the edge of the 
magnetosphere, but our numerical calculations show that this cannot 
be the case (Fig. 1 and Supplementary Information). This is because the 
gradual acceleration would lead to a large number of high-energy 
electrons being accelerated close to the light cylinder and, con- 
sequently, to the prolific production of inverse-Compton y-rays, in 
contradiction with the reported fluxes. Thus, the effective acceleration 
of the wind should start not much before the radius of 30R;, and not 
much beyond it. Such a case, assuming a linear acceleration rate of 
I(R) = Io + a(R/R, — 1) within the 20R,;-50R, radial interval and a 
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maximum Lorenz factor of 10° achieved at 50R,, is shown in Fig. 1. The 
corresponding y-ray spectrum is smoother than the energy spectra 
predicted in the case of an instant acceleration, and better fits the 
VHE spectral points (Fig. 1) with the position of the sharp cut-off in 
the y-ray spectrum shifted to 500 GeV. Although the wind acceleration 
within the 20R;-50R, interval seems to be a physically more realistic 
scenario than an instant acceleration, this is still quite a narrow zone 
and the acceleration of the wind up to the Lorentz factor of 10° is 
therefore quite abrupt. This conclusion does not agree with those of 
alternative models, for example the so-called reconnection models of 
pulsar wind nebulae** based on the assumption that the transforma- 
tion of the Poynting flux to kinetic energy of bulk motion is a slow 
process that takes place over the entire region of the unshocked wind. 
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The same pocket in menin binds both MLL and JUND 
but has opposite effects on transcription 


Jing Huang’?, Buddha Gurung**, Bingbing Wan!?*, Smita Matkar®, Natalia A. Veniaminova*, Ke Wan'”, Juanita L. Merchant*®, 


Xianxin Hua® & Ming Lei? 


Menin is a tumour suppressor protein whose loss or inactivation 
causes multiple endocrine neoplasia 1 (MEN1), a hereditary auto- 
somal dominant tumour syndrome that is characterized by tumor- 
igenesis in multiple endocrine organs’. Menin interacts with many 
proteins and is involved in a variety of cellular processes” *. Menin 
binds the JUN family transcription factor JUND and inhibits its 
transcriptional activity”’. Several MENI missense mutations dis- 
rupt the menin-JUND interaction, suggesting a correlation between 
the tumour-suppressor function of menin and its suppression of 
JUND.-activated transcription”’®. Menin also interacts with mixed 
lineage leukaemia protein 1 (MLL1), a histone H3 lysine 4 methyl- 
transferase, and functions as an oncogenic cofactor to upregulate 
gene transcription and promote MLL1-fusion-protein-induced 
leukaemogenesis””"”"”. A recent report on the tethering of MLL1 
to chromatin binding factor lens epithelium-derived growth factor 
(LEDGF) by menin indicates that menin is a molecular adaptor 
coordinating the functions of multiple proteins’*. Despite its 
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Figure 1 | Overview of the human menin-MLL1);3m complex structure. 

a, Isothermal titration calorimetry measurement of the menin-MLL1 pm 
interaction. The inset shows the isothermal titration data. b, Overall structure of 
the menin-MLL1y5 complex. The N-terminal domain is shown in orange, the 
thumb domain in green, the palm domain in blue, the fingers domain in cyan, 


importance, how menin interacts with many distinct partners and 
regulates their functions remains poorly understood. Here we pre- 
sent the crystal structures of human menin in its free form and in 
complexes with MLL1 or with JUND, or with an MLLI-LEDGF 
heterodimer. These structures show that menin contains a deep 
pocket that binds short peptides of MLL1 or JUND in the same 
manner, but that it can have opposite effects on transcription. The 
menin-JUND interaction blocks JUN N-terminal kinase (JNK)- 
mediated JUND phosphorylation and suppresses JUND-induced 
transcription. In contrast, menin promotes gene transcription by 
binding the transcription activator MLL1 through the peptide 
pocket while still interacting with the chromatin-anchoring protein 
LEDGEF at a distinct surface formed by both menin and MLL1. 
The amino-terminal region of MLL1 interacts with menin'*’**. 
Isothermal titration calorimetry measurements showed that the 
menin-binding motif (residues 6-25) of MLL1 (MLL1 45m) is necessary 
and sufficient for menin binding (Fig. 1a and Supplementary Fig. la-c). 


Conserved 


Variable 


and loop regions that are disordered or not included in the crystal structure are 
shown as dashed lines. MLL1,4py is shown as a stick model in yellow. c, The 

surface representation of menin indicates that menin adopts a curved left-hand- 
shaped conformation. d, Front view of the menin-MLL1 gm complex, coloured 
according to the degree of amino acid conservation among menin homologues. 
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MLL2, the closest relative of MLL1, contains a sequence that is almost 
identical to MLL1ypm at its N terminus (Supplementary Fig. 1b); 
MLL2)6_35 (MLL2\ypm) binds to menin with an affinity that is com- 
parable to that of MLL1pm (Supplementary Fig. 1d). To understand 
how MLLI and MLL2 (collectively referred to as MLL) are recognized 
by menin, we determined the crystal structures of human menin alone 


or in complex with MLL1ypm (Supplementary Fig. 2, Supplementary 
Table 1 and Supplementary Information). The structure of human 
menin closely resembles a recently published menin homologue struc- 
ture from Nematostella’®. 

The conformation of menin resembles a curved left ‘hand’ with a 
deep pocket formed by its ‘thumb’ and ‘palm’ (Fig. 1b, c). Menin consists 
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Figure 2 | Structural and mutational analyses of the menin-MLL1 gm 
interaction. a, Stereo view of the menin-MLL1 py interface. The 
intermolecular hydrogen bonds are shown as dashed magenta lines. 

b, Phe 9" (yellow) is nested in a hydrophobic pocket of menin formed by the 
thumb (green) and the palm (blue). c, Electrostatic surface potential of the 
MLL1yypm-binding cavity of menin (positive potential, blue; negative potential, 
red). d, Co-immunoprecipitation of wild-type (WT) or mutant menin and 
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Menin 


MLL1 H3K4me3 IgG 


MLLI proteins from 293T cells. Arrows and asterisks indicate the positions of 
MYC-MLL] and immunoglobulin G (IgG), respectively. IB, immunoblot; IN, 
input; IP, immunoprecipitation. e, f, Expression of Hoxc8 (e) and distributions 
of menin, MLL] and H3K4me3 at the Hoxc8 promoter (f) in Men1 ~’~ mouse 
embryonic fibroblasts (MEFs) complemented with control vector, WT or 
mutant menin (n = 6; error bar, standard deviation). 
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of four associated domains: an N-terminal domain characterized by a 
long B-hairpin, a transglutaminase-like domain that forms the thumb, a 
helical palm domain that contains three TPR motifs’” and a carboxy- 
terminal fingers domain (Fig. 1b, c, Supplementary Fig. 3 and Sup- 
plementary Information). Menin is highly conserved across species, 
and the conserved residues are either buried in the hydrophobic core 
or clustered together on a surface patch that covers the thumb and 
palm (Fig. 1d). MEN1 disease-derived missense and in-frame deletion 
mutations are evenly distributed throughout the protein (Supplemen- 
tary Fig. 4), indicating that all four domains are important for the in vivo 
function of menin (Supplementary Fig. 4 and Supplementary Table 2). 

The MLL1ypm peptide adopts a compact conformation and plugs 
into the deep pocket of menin (Fig. 2a and Supplementary Fig. 5). 
Mutagenesis data indicates that MLL1\ypm residues Arg 6-Trp 7-Arg 
8-Phe 9-Pro 10-Ala 11-Arg 12-Pro 13 and their interacting residues in 
menin contribute the most towards the interaction (Supplementary 

Ri 6 and 7, and Supplementary Tables 3 and 4). The side-chain of Phe 
"1 fits into a hydrophobic cavity formed by the thumb and palm of 
menin (Fig. 2b). A menin Met278Trp substitution altered the cavity 
shape and led to complete loss of binding (Supplementary Table 4). The 
MLL1\pm-binding pocket is highly acidic (Fig. 2c). The two C-terminal 
arginine residues (Arg 24 and Arg 25) in MLL1 gm are disordered, but 
they seem to be important for interaction, given that glutamate sub- 
stitution resulted in a 21-fold decrease in binding affinity (Supplemen- 
tary Table 3). Consistent with this, mutation of the acidic residues of 
menin also led to decreased binding (Supplementary Table 4). 

Next, we examined the MLL1,;3m-binding activity of several MEN1 
disease-derived mutations (His139Asp, Cys241Phe, Ala242Val, 
Gly281Arg, Ala284Gln, and Thr344Arg). Except for Ala284Gln and 
Thr344Arg, which yielded insoluble proteins, the remaining mutants 
impaired the menin-MLL1,4pm interaction (Supplementary Table 4). 
To further examine the menin—MLL] interaction in vivo, we studied 
the interactions of mutant proteins that are transiently expressed in 
human embryonic kidney 293T cells. Consistent with the isothermal 
titration calorimetry analysis, co-immunoprecipitation data showed 
that mutations of the key residues at the interface completely abolished 
the menin-MLL1 interaction in cells (Fig. 2d). 

Menin upregulates the expression of homeobox genes Hoxc8 and 
Hoxcé6 (ref. 5). To test the effect of the menin-MLL interaction on the 
expression levels of Hoxc8 and Hoxc6, wild-type and MLL-binding 
deficient mutants of menin were individually used to complement 
menin-null mouse embryonic fibroblasts. Western blot analyses indi- 
cated comparable expression of wild-type and mutant proteins in cells 
(Supplementary Fig. 8a). When MenI ‘~ cells were complemented 
with wild-type menin, expression of Hoxc8 and Hoxc6 dramatically 
increased compared to vector-expressing cells (Fig. 2e and Sup- 
plementary Fig. 8b). In contrast, overexpression of the menin mutants 
in Men1 ‘cells failed to upregulate the messenger RNA levels of Hoxc8 
or Hoxc6 (Fig. 2e and Supplementary Fig. 8b), suggesting that the 
menin-MLL interaction is essential for Hoxc8 and Hoxc6 expression. 

Next we performed chromatin immunoprecipitation (ChIP) assays 
to determine the binding of mutant menin at the Hoxc8 promoter. 
Except for Ala284Gln (a mutant that leads to insoluble proteins), all 
other mutants bound to the Hoxc8 promoter as effectively as wild-type 
menin (Fig. 2f). Expression of wild-type or mutant menin did not 
greatly affect H3 distribution at the Hoxc8 promoter (Supplemen- 
tary Fig. 8c). Notably, Men1 ‘~ cells complemented with wild-type 
menin exhibited a substantial increase in MLL1 binding and histone 
H3K4me3 trimethylation at the Hoxc8 promoter compared with 
vector-expressing or mutant-menin-expressing cells (Fig. 2f). 
Therefore, although menin mutants were able to bind to the Hoxc8 
promoter, their ability to recruit MLL1 and thus establish H3K4me3 at 
the Hoxc8 promoter was compromised, resulting in reduced Hoxc8 
expression. 

LEDGF, a chromatin-associated protein’’, is required for MLL1- 
dependent transcription and leukaemic transformation’. Isothermal 
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titration calorimetry measurement showed that a complex composed 
of menin and an N-terminal fragment of MLL1, called MLL1yjgm-tpm 
(comprising residues 6-153 and including both menin-binding and 
LEDGF-binding motifs) binds to the integrase binding domain of 
LEDGF (LEDGF;gp) with an affinity of 470 nM (Fig. 3a and Sup- 
plementary Fig. 9a). In contrast, neither menin nor MLL1ypm-tpm 
alone could interact with LEDGFygp (Supplementary Fig. 9b)'*. We 
determined the menin-MLL1\ypm-1pm-LEDGFipp complex structure 
at a resolution of 3.0 A (Supplementary Fig. 9c and Supplementary 
Table 1). MLL1ypo-tpm exhibits an extended conformation and binds 
to menin through two major sites (Fig. 3b); the N-terminal MLL1ypm 
coil folds into the high-affinity pocket of menin in the same manner as 
in the menin—MLL1);py@ Structure (Supplementary Figs 9d, e), whereas 
the C-terminal helix «2 packs on the surface of the N-terminal domain 
of menin to form a V-shaped groove for LEDGF pp binding (Fig. 3b and 
Supplementary Fig. 9f). The middle loop of MLL1pm-tpm spans a large 
distance on menin without many specific interactions except for two 
leucine residues (Leu 106 and Leu 116) with side-chains that point to 
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Figure 3 | Structure of the menin-MLL1ygm-1am-LEDGFjpp ternary 
complex. a, Domain organization of menin, MLL1 and LEDGF. The menin- and 
LEDGF-binding motifs and LBM motifs of MLL1 are shown in yellow, menin in 
cyan, the integrase-binding domain of LEDGF in red and other regions in grey. 
Interactions among the three proteins are shown in orange. b, Ribbon diagram of 
the menin-MLLI\ygm-1pm-LEDGF {gp complex. Menin is in cyan, 
MLL1ypm-tem in yellow and LEDGFygp in red. c, The extended MLL1 loop 
between MLL1 pm and MLL1;,y covers a large part of the surface area of menin. 
d, Detailed view of the intermolecular three-helix-bundle at the ternary interface. 
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two shallow pockets on the menin surface, defining the path of the loop 
(Fig. 3c and Supplementary Fig. 9g). Helix aE of LEDGFigp is 
sandwiched between helices «2 of MLL1 and «4 of menin through both 
hydrophobic and electrostatic interactions (Fig. 3d). In support of the 
crystal structure, mutations of residues on the 04 helix of the 
N-terminal domain of menin (Ala95Arg and Ser104Tyr) specifically 
disrupted the interaction with LEDGF (Supplementary Table 5 and 
Supplementary Fig. 10). Notably, Men1~'~ cells that were comple- 
mented with these two mutants failed to stimulate Hoxc8 expression 
(Supplementary Fig. 11), suggesting that a functional menin-MLLI- 
LEDGF complex is required for upregulation of Hoxc8 expression. 
Together, our data show that menin functions as an adaptor molecule 
to modulate gene expression by binding MLL1 at one site while also 
interacting with LEDGF at a distinct surface. 

Although MLL1 and MLI2 share many functional motifs, including 
the menin-binding motif (Supplementary Fig. 12), MLL2 does not 
contain a LEDGF-binding motif sequence and thus would not form 
a ternary complex with menin and LEDGF. Given that the PWWP 
domain of LEDGF, which contains a relatively well conserved Pro- 
Trp-Trp-Pro signature, is required for MLL1-mediated leukaemic 
transformation’*"’, the inability of MLL2 to form a menin-MLL2- 
LEDGF complex explains why only MLL1, and not MLL2, has so far 
been described as a proto-oncogene that can be activated by chromo- 
somal translocations. 

Menin also interacts directly with transcription factor JUND”’. We 
defined JUND residues 27-47 as the menin-binding motif 
(QUNDmpm) with an affinity of 1.6 uM (Fig. 4a and Supplementary 
Fig. 13). Sequence comparison of JUNDypm and MLL1 pm revealed a 


striking similarity (Fig. 4a), suggesting that JUNDypm might interact 
with menin through the same binding pocket as does MLL1 pm. 
Consistent with this idea, both isothermal titration calorimetry and 
glutathione S-transferase (GST) pull-down assays showed that MLL1 
could efficiently compete with JUND for menin binding (Supplemen- 
tary Fig. 14). 

We determined the menin-JUNDypm complex structure, which 
shows many similarities to the menin-MLLIyypm structure (Fig. 4b 
and Supplementary Table 1). First, the Phe-Pro-(Ala or Gly)-(Arg or 
Ala)-Pro motifs in both menin-binding motifs are almost identical in 
overall conformation (Supplementary Fig. 15a). Second, Phe32, Pro33 
and Pro36 of JUND interact with menin in the same way as their counter- 
parts in MLL1ygm (Supplementary Fig. 15b, c). Notably, two lysine 
residues (Lys 46 and Lys 47) in JUND)pp equivalent to the disordered 
Arg 24 and Arg 25 in MLL1 pm are Visible in the electron density map 
and point to an acidic surface on menin (Supplementary Fig. 15c). 
Mutation of these lysine residues and other key binding residues at the 
interface abolished or weakened the interaction both in vitro and in vivo 
(Fig. 4c, Supplementary Fig. 16 and Supplementary Table 6). 

Menin uncouples JUND phosphorylation from JNK activation, but 
the mechanism is poorly understood’’. The consensus JNK-docking 
domain (D-domain) contains a cluster of basic amino acids preceding 
two leucine residues” (Fig. 4d). JUND yp is partially overlapped with 
a putative D-domain of JUND (JUNDp)”' (Fig. 4d). Both the basic 
residues and the leucine residues in JUNDp are indispensable for JNK 
docking on JUND as well as JNK-mediated JUND phosphorylation 
(Fig. 4e, f and Supplementary Fig, 17a, b). Thus, Lys 46 and Lys 47 
are both required for menin binding and JUND phosphorylation by 
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Figure 4 | Structural and functional studies of the menin-JUND 
interaction. a, Sequence alignment of the menin-binding motif sequences of 
JUND, MLL] and MLI2. Conserved residues are highlighted in yellow. 

b, Crystal structure of the menin-JUND gm complex. Menin is coloured as in 
Fig. 1c and JUNDmpm is shown as a purple stick model. c, Co- 
immunoprecipitation of WT or mutant menin and JUND from 293T cells. 

d, Sequence comparison of the N termini of JUND and c-JUN. The menin- 
binding motif sequence of JUND is highlighted in purple. Key residues in the 
JNK-docking domain are denoted with blue dots and three phosphorylation 
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sites are labelled. e, In vitro GST-pull-down analysis of the interactions between 
FLAG-tagged JNK and the indicated JUN proteins. f, In vitro phosphorylation 
of WT or mutant JUND by JNK. g, In vitro phosphorylation of WT or menin- 
binding-deficient JUND by JNK in the presense or absence of menin. h, Menin 
suppresses JUND phosphorylation in response to anisomycin activation of JNK 
in 293T cells. i, WT or mutant JUND plasmids were transfected into 293T cells 
with AP1 and Renilla reporter plasmids, and with or without menin. Luciferase 
assays were performed 2 days after transfection (n = 4; error bar, standard 
deviation). 
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JNK. This led us to test whether menin inhibits JUND phosphorylation 
through sequestering JUND from JNK. In GST-pull-down assay, GST- 
JUND can only pull down JNK in the absence of menin, indicating that 
menin has a higher affinity to JUND (Fig. 4e and Supplementary 
Fig. 17a). Furthermore, when menin was added, phosphorylation of 
JUND was clearly inhibited (Fig. 4g and Supplementary Fig. 17c). In 
contrast, phosphorylation of menin-binding-deficient mutants of 
JUND was not affected by menin (Fig. 4g and Supplementary 
Fig. 17c). Next, we examined whether menin could inhibit JUND 
phosphorylation in response to anisomycin activation of JNK in 
293T cells'’. Although wild-type JUND phosphorylation was sup- 
pressed by menin, menin-binding deficient mutants remained robustly 
phosphorylated in the presence of menin (Fig. 4h). Notably, menin had 
no effects on JNK binding and JNK-mediated phosphorylation of 
c-JUN, a close homologue of JUND that lacks a menin-binding motif 
(Fig. 4d, e, g and Supplementary Fig. 17a, c, d). Together, our findings 
reveal that the menin-JUND interaction blocks JNK docking on JUND 
and inhibits the JNK-mediated phosphorylation. 

Menin represses JUND-mediated transcriptional activation*’. To 
examine whether this repression depends on the menin-JUND inter- 
action, wild-type or menin-binding-deficient mutants of JUND were 
co-transfected into 293T cells with an AP1 luciferase reporter plasmid 
in the presence or absence of menin (Supplementary Fig. 17e). 
Consistent with previous studies, transactivation by JUND was effec- 
tively repressed by menin”” (Fig. 4i). In contrast, menin exhibited a 
marginal effect on mutant JUND-mediated transcriptional activation 
(Fig. 4i). We recently demonstrated that JUND induces gastrin gene 
expression in human AGS gastric cells and that this induction can be 
suppressed by menin”. Consistent with the luciferase assay, menin 
failed to suppress the gastrin upregulation that was induced by mutant 
JUND, suggesting that the menin-JUND interaction is important in 
gastrin expression regulation (Supplementary Fig. 18). Thus, we con- 
clude that the menin-JUND interaction plays a key part in suppressing 
JUND-mediated transcriptional activation. 

In summary, our structural and functional studies provide a mech- 
anistic explanation of how menin could both positively and negatively 
regulate gene transcription. Our findings also provide evidence that 
menin acts as a scaffold protein to assemble a menin-MLL1-LEDGF 
ternary complex to coordinate gene transcription and promote MLL1- 
fusion-protein-induced leukaemogenesis. 


METHODS SUMMARY 


Human menin, LEDGFigp and the MLL and JUND peptides were expressed in 
Escherichia coli BL21(DE3) and purified by sequential affinity and gel-filtration chro- 
matography purification. Menin crystals were obtained in sitting drops over 100 mM 
sodium cacodylate (pH 6.5) and 1.4 M sodium acetate. Crystallization of menin with 
the MLL1 gm or JUNDyypm peptides was achieved by sitting-drop diffusion with a 
well solution containing 100 mM Tris-HCl (pH 7.0), 200 mM MgCl, and 2.3 M NaCl. 
The menin-MLL1\ygm-tam-LEDGF)gp complex was crystallized by hanging-drop 
vapour diffusion against a well solution of 50 mM HEPES (pH 7.0), 1.6 M (NH4)2SO4, 
10 mM MgCl, 0.016% L-canavanine, 0.016% O-phospho-t-serine, 0.016% taurine, 
0.016% quinine, 0.016% sodium glyoxylate monohydrate and 0.016% cholic acid, and 
were dehydrated with the solution containing 50 mM HEPES (pH 7.0), 2.3 M 
(NH4)2SO, and 10 mM MgCl. The menin-MLL1 pm complex structure was deter- 
mined by multi-wavelength anomalous dispersion to a resolution of 3.0 A. The 
structures of menin alone, the menin-MLL1\ygm-18m-LEDGFypp ternary complex, 
and the menin-JUNDmgm complex were solved by molecular replacement and 
refined to resolutions of 2.5 A, 3.0 A and 2.85 A, respectively. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein expression and purification. To facilitate crystallization, we genetically 
deleted an unstructured loop (residues 460-519) in menin, a short fragment (residues 
40-45) in JUNDmpm and two loop regions (residues 16-22 and 36-102) in 
MLL1y\pm-1em- All the resulting proteins retain wild-type-like binding affinities 
(Supplementary Figs 2d, 9c and 13d). For simplicity, MeninA, JUNDypaA and 
MLLIypm-tpmA are referred to as menin, JUNDypm, and MLL1ygm-tam 
respectively, unless stated otherwise. 

Various human menin proteins and the MLL and JUND peptides were 
expressed in E. coli BL21(DE3) using a modified pET28b vector with a SUMO 
protein fused at the N terminus after the His, tag. After induction for 16 h with 0.1 
mM isopropylthiogalactoside (IPTG) at 25 °C, the cells were collected by 
centrifugation and the pellets were resuspended in lysis buffer (50 mM Tris- 
HCl, pH 8.0; 50 mM NaH,PO,; 400 mM NaCl; 3 mM imidazole; 10% glycerol; 
0.1 mg ml! lysozyme; 2 mM 2-mercaptoethanol; 1 mM PMSF; 5 mM benzamidine; 
1 pg ml leupeptin; and 1 jig ml‘ pepstatin). The cells were then lysed by sonica- 
tion and the cell debris was removed by ultracentrifugation. The supernatant was 
mixed with Ni-NTA agarose beads (Qiagen) and rocked for 2 h at 4 °C before elution 
with 250 mM imidazole. Ulp1 protease was then added to remove the Hiss-SUMO 
tag. After Ulp1 digestion, the menin proteins and the MLL and JUND peptides were 
further purified by gel-filtration chromatography on Hiload Superdex 200 and 
Hiload Superdex 75 columns (GE Healthcare), equilibrated with buffer A (25 mM 
Tris-HCl, pH 8.0; 150 mM NaCl; and 5 mM dithiothreitol (DTT)) and buffer B 
(100 mM ammonium bicarbonate), respectively. The purified menin proteins were 
concentrated to 25 mg ml ' and stored at —80 °C. The purified peptides were 
lyophilized and resuspended in water at a concentration of 50 mg ml’ and stored 
at —80 °C. 

For the menin-MLL1pm-tpm-LEDGF {gp complex, we cloned LEDGFigp into 
a modified pET28b vector with a SUMO protein fused at the N terminus after the 
His, tag. MLL1pm-ipo was cloned into a GST fusion protein expression vector, 
pGEX6p-1 (GE healthcare). The menin-MLL1\gm-tpm complex and LEDGF;gp 
itself were expressed in £E. coli BL21(DE3), respectively. The menin- 
MLL1\pm-Lpm Complex was purified by sequential affinity chromatography with 
Ni-NTA agarose beads and glutathione sepharose 4B beads (GE Healthcare). After 
removal of the Hiss~UMO tag and GST tag with Ulp1 and Protease 3C, respec- 
tively, the complex was purified further with gel-filtration chromatography on a 
Hiload Superdex 200. Meanwhile, LEDGFigp was purified in the same way as 
menin and then mixed with the purified menin-MLL1ygm-tpm complex with a 
molar ratio of 2:1. After 1 h incubation on ice, the protein mixtures were purified 
again with gel-filtration chromatography on a Hiload Superdex 200 column. 

For the in vitro assays, mutant menin proteins were expressed in E. coli and 

purified following the procedure described above. All the mutant menin proteins 
displayed unaltered biophysical properties as analysed by gel-filtration chromato- 
graphy (data not shown), ensuring that the altered affinities of the menin mutants 
for MLL1ypx~p MLL1yygm-tpm-LEDGF pp and JUND gy are not attributable to a 
change in the structural integrity of the resulting proteins. 
Crystallization, data collection and structure determination. Menin was 
crystallized by sitting-drop vapour diffusion at 4 °C. The precipitant solution 
contained 100 mM sodium cacodylate trihydrate (pH 6.5) and 1.4 M sodium 
acetate trihydrate. For the menin-MLL1yypm complex, purified menin was first 
mixed with the MLL1 gy peptide at a molar ratio of 1:2 and then the mixture was 
incubated on ice for 1 h to allow complex formation. Crystallization of the complex 
was achieved by sitting-drop vapour diffusion at 4 °C with the well solution 
containing 100 mM Tris-HCl (pH 7.0), 200 mM MgCl, and 2.3 M NaCl. A similar 
procedure was also used for crystallization of the menin-JUNDyypm complex. The 
menin-MLL1ygm-1sm-LEDGF;gp complex was crystallized by hanging-drop 
vapour diffusion at 4 °C with the well solution containing 50 mM HEPES (pH 
7.0), 1.6 M (NH4)2SO,, 10 mM MgCl, 0.016% L-canavanine, 0.016% O-phospho- 
L-serine, 0.016% taurine, 0.016% quinine, 0.016% sodium glyoxylate monohydrate 
and 0.016% cholic acid. The crystals were then dehydrated with the solution 
containing 0.05 M HEPES (pH 7.0), 2.3 M (NH4)2SO, and 0.01 M MgCh. 

All of the crystals were gradually transferred into a harvesting solution contain- 
ing the respective precipitant solutions plus 5 M sodium formate, before being 
flash-frozen in liquid nitrogen for storage. Data were collected under cryogenic 
conditions (100 K). Selenomethionine—multi-wavelength anomalous dispersion 
data set of the menin-MLL1y;gy, complex at the Se peak and inflection wave- 
lengths were collected at the Advanced Photon Source (APS) beamline 21-ID-D 
and processed using HKL2000 (ref. 23). Seven selenium atoms were located and 
refined, and the multiwavelength anomalous diffraction data phases were calcu- 
lated using SHARP”. The initial multi-wavelength anomalous dispersion map of 
the menin-MLL1y2m complex was substantially improved by solvent flattening. 
A model was manually built into the modified experimental electron density using 
O (ref. 25) and further refined in Phenix”. Native data sets of menin and the menin 


complexes were collected at the APS beamline 21-ID-D and processed using 
HKL2000. The structures were determined by molecular replacement using 
Phaser in the CCP4i suite” and further refined in Phenix. The majority (~95%) 
of the residues in all structures lie in the most favoured region in the 
Ramachandran plot, and the remaining structures lie in the additionally stereo- 
chemically allowed regions in the Ramachandran plot. 

Isothermal titration calorimetry. The equilibrium dissociation constants of the 
menin-MLLygm, menin-JUNDygm and menin-MLL1ygm-tpm-LEDGFipp 
interactions were determined using a VP-ITC calorimeter (MicroCal). The bind- 
ing enthalpies were measured at 20 °C in 25 mM Tris-HCl (pH 8.0) and 150 mM 
NaCl. Two independent experiments were performed for every interaction 
described here. Isothermal titration calorimetry data were subsequently analysed 
and fitted using Origin 7 software (OriginLab) with blank injections of peptides 
into buffer subtracted from the experimental titrations before data analysis. 
Yeast two-hybrid assay. The yeast two-hybrid assays were performed using the 
yeast L40 strain harbouring pBTM116 and pACT2 (Clontech) fusion plasmids. 
The colonies containing both plasmids were selected on -Leu -Trp plates. The 
activities of -galactosidase were measured according to Clontech 
MATCHMAKER library protocol and the averages from three individual trans- 
formants were reported. 

Plasmid construction. To generate recombinant retroviruses, pMX-2 FLAG- 
menin was constructed by inserting polymerase chain reaction (PCR)-amplified 
menin cDNA into the BamHI/Notl site of the retroviral vector pMX-2 FLAG. 
To generate menin mutants, pMX-2 FLAG -menin was used as a template for 
site-directed mutagenesis using the QuikChange kit from Agilent. 

Cell culture and transfection. Menin-null MEFs, HEK293T and the human AGS 
gastric adenocarcinoma cell line were cultured in Dulbecco’s modified Eagle’s 
medium complemented with 10% fetal calf serum and 1% PenStrep. Menin-null 
MEFs were infected with empty vector, wild-type or mutant menin-expressing 
retroviruses and were subjected to puromycin selection (2 ug ml’) 72 h post- 
infection for 2 days. AGS and 293T cells were transiently transfected with the 
indicated expression vectors using Lipofectamine 2000 (Invitrogen) for 48 h. 
Co-immunoprecipitation. Human 293T cells were transfected with pcDNA3.1 
vectors encoding c-MYC-tagged MLL1 (residues 1-153) and FLAG-tagged 
menin. Two days after transfection the cells were resuspended in 1 ml of lysis 
buffer (20 mM Tris-HCl, pH 7.5; 150 mM NaCl; 1.0% Triton X-100; 1 mM EDTA; 
and protease inhibitor cocktail). Immunoprecipitation of lysates was conducted 
using 20 jl anti-FLAG M2 affinity agarose (Sigma). After washing with lysis 
buffer, immunoprecipitated proteins were eluted with 2 loading buffer (50 
mM Tris-HCl, pH 6.8; 2% SDS; 10% 2-mercaptoethanol; 10% glycerol; and 
0.002% bromophenol blue), subjected to protein gel-electrophoresis using 
4-20% SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and then trans- 
ferred to a polyvinylidene fluoride (PVDF) membrane. After blocking with 
TBST buffer containing 5% skimmed milk, proteins on the membrane were 
detected by western blot using anti-FLAG (Sigma) and anti-c-MYC (Santa Cruz 
Biotechnology) antibodies. The same procedure was also used for the co-immu- 
noprecipitation experiments for menin and JUND. 

Quantitative real-time PCR analysis. Exponentially growing MEFs were seeded 
at 2X 10° cells per 100-mm dish and harvested 2 days later. AGS cells were 
transfected with the menin and JUND expression vectors for 48 h. Total RNA 
was isolated with an RNeasy minikit from Qiagen. Quantitative real-time PCR 
(qRT-PCR) was performed in an ABI 7500 Real Time PCR system (Applied 
Biosystems). 

ChIP assay. MEFs were cross-linked with 1% formaldehyde for 10 min at 37 °C. 
Cross-linking was stopped by addition of 125 mM glycine. The ChIP assay was 
performed using the QuikCHIP kit from Imgenex, according to the manufac- 
turer’s instructions. Antibodies used for ChIP were anti-menin (Bethyl labs), 
anti-MLLI, anti-histone H3K4me3, anti-histone H3 and IgG (Abcam). 
Antibody-precipitated DNA-protein complex was reverse cross-linked, and the 
DNA was isolated using phenol-chloroform extraction and the precipitated DNA 
was used as the template for PCR. 

GST-pull-down assay. GST, GST-fused c-JUN (residues 1-246), GST-fused 
JUND (residues 1-150) and FLAG-tagged JNK3 were expressed in E. coli 
BL21(DE3) and were purified to homogeneity. GST-pull-down assays were per- 
formed by incubating 10 jig of GST or GST-JUN, 10 pg of FLAG-JNK3 with 10 pl 
of glutathione sepharose 4B beads and either with or without 20 jug of full-length 
menin in binding buffer (50 mM Tris-HCl (pH 8.0) and 150 mM NaCl) at 4 °C 
overnight. The beads were then extensively washed with binding buffer four times 
and the bound proteins were eluted with 10 mM reduced glutathione in binding 
buffer. After separation on 15% SDS-PAGE and Ponceau S staining, FLAG-tagged 
JNK3 protein was detected by western blot using anti-FLAG antibody. 

In vitro kinase assay. WT c-JUN (residues 1-246) and WT or mutant JUND 
proteins (residues 1-150) were expressed in E. coli BL21(DE3) and purified as 
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described above for the purification of menin. For the in vitro kinase assay, 2 Ug 
substrate was mixed with 0.5 yg kinase and 50 uM ATP, either with or without 10 
ug of full-length menin protein in the kinase buffer (50 mM Tris-HCl, pH 7.5; 20 
mM MgCl; 20 mM B-glycerophosphate; 2 mM DTT; and 0.1 mM sodium ortho- 
vanadate), and incubated at 30 °C for 1 h. The reaction mixtures were then 
separated on 15% SDS-PAGE, visualized with Ponceau S staining and the phos- 
phorylated JUN proteins were detected with anti-JUND phosphor-Ser100 (anti-c- 
JUN phosphor-Ser 73) antibody (Cell Signaling). 

In vivo kinase assay. 293T cells were transfected with expression vectors encoding 
FLAG-tagged menin and c-MYC tagged JUND. After 48 h of transfection, cells 
were incubated for 30 min with or without 10 pg ml‘ anisomycin (Sigma), a 
potent JNK activator, and then the cell lysates were subjected to western blot with 
anti-JUND phosphor-Ser100 (anti-c-JUN phosphor-Ser 73, Cell Signaling), anti- 
FLAG and anti-c-MYC antibodies. 

Luciferase assay. 293T cells were transfected with 1 1g of AP1 luciferase reporter 
plasmid (Stratagene), which contains seven copies of AP1-binding consensus 
12-O-tetradecanoylphorbol 13-acetate-response element (TRE) upstream of the 
luciferase reporter gene), 0.25 jg of Renilla reporter plasmid and 0.5 pg of WT or 
mutant JUND plasmids, either without or with 0.5 j1g of menin cDNA. Luciferase 
assays were performed using the dual luciferase assay kit (Promega) 2 days after 
transfection. To determine the protein expression in each transfection, 20 Lig of cell 
lysates were immunoblotted with anti-menin (Bethyl Laboratories) and anti- 
JUND (Santa Cruz Biotechnology) antibodies. 
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Cell fractionation. 10° 293T cells were collected and washed in cold PBS and 
hypotonic buffer (10 mM Tris-HCl, pH 7.3; 10 mM KCI; 1.5 mM MgCl; 0.2 mM 
PMSF; and 10 mM f-mercaptoethanol). The cells were then allowed to swell for 15 
min in hypotonic buffer. The swelled cells were then homogenized with glass 
Dounce homogenizer (Wheaton) using the loose pestle until cell membrane lysis 
was 80-90%. The nuclei were collected by centrifuging for 15 min at 3,300g, 
resuspended in high salt buffer (600 mM KCI; 20 mM Tris pH 7.4; 25% glycerol; 
1.5 mM MgCl; and 0.2 mM EDTA) and homogenized to break the nuclear 
membrane. The nuclear extracts were collected by centrifugation at 25,000g for 
30 min and were then fractionated on a Superose 6 gel-filtration column (GE 
Healthcare). The resulting fractions were resolved by 10% SDS-PAGE and probed 
with anti-menin, anti-MLL1 and anti-JUND antibodies. 
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Control of ground-state pluripotency by allelic 


regulation of Nanog 


Yusuke Miyanari' & Maria-Elena Torres-Padilla' 


Pluripotency is established through genome-wide reprogramming 
during mammalian pre-implantation development, resulting in the 
formation of the naive epiblast. Reprogramming involves both the 
resetting of epigenetic marks and the activation of pluripotent-cell- 
specific genes such as Nanog and Oct4 (also known as Pou5f1)'~*. 
The tight regulation of these genes is crucial for reprogramming, 
but the mechanisms that regulate their expression in vivo have not 
been uncovered. Here we show that Nanog—but not Oct4—is 
monoallelically expressed in early pre-implantation embryos. 
Nanog then undergoes a progressive switch to biallelic expression 
during the transition towards ground-state pluripotency in the 
naive epiblast of the late blastocyst. Embryonic stem (ES) cells 
grown in leukaemia inhibitory factor (LIF) and serum express 
Nanog mainly monoallelically and show asynchronous replication 
of the Nanog locus, a feature of monoallelically expressed genes’, 
but ES cells activate both alleles when cultured under 2i conditions, 
which mimic the pluripotent ground state in vitro. Live-cell imaging 
with reporter ES cells confirmed the allelic expression of Nanog and. 
revealed allelic switching. The allelic expression of Nanog is regu- 
lated through the fibroblast growth factor-extracellular signal- 
regulated kinase signalling pathway, and it is accompanied by 
chromatin changes at the proximal promoter but occurs indepen- 
dently of DNA methylation. Nanog-heterozygous blastocysts 
have fewer inner-cell-mass derivatives and delayed primitive endo- 
derm formation, indicating a role for the biallelic expression of 
Nanog in the timely maturation of the inner cell mass into a fully 
reprogrammed pluripotent epiblast. We suggest that the tight 
regulation of Nanog dose at the chromosome level is necessary for 
the acquisition of ground-state pluripotency during development. 
Our data highlight an unexpected role for allelic expression in 
controlling the dose of pluripotency factors in vivo, adding an extra 
level to the regulation of reprogramming. 

The development of the naive epiblast in the late blastocyst is 
orchestrated by pluripotency-associated transcription factors such as 
NANOG and OCT4. OCT4 is required for establishing and maintain- 
ing the pluripotent state’, and NANOG is essential for the acquisition 
of pluripotency’”. In ES cells, the pluripotent state is also governed by 
these core transcription factors. Nanog occupies a central position in 
this network, but the mechanisms regulating its expression are unclear. 

To address how pluripotency-associated factors are regulated during 
reprogramming in vivo, we assayed the de novo gene expression of 
Nanog and Oct4 in mouse pre-implantation embryos, by using RNA 
fluorescence in situ hybridization (RNA-FISH), which allows nascent 
transcripts to be visualized. We found that Oct4 is actively expressed in 
a small proportion of 2-cell-stage embryos, but most blastomeres 
express Oct4 at the 4- and 8-cell stage (Fig. 1a, b). We detected active 
Nanog expression in about half of all 4- and 8-cell-stage blastomeres, 
which is consistent with previous reports of heterogeneous Nanog 
expression’. Surprisingly, although all of the other genes that we ana- 
lysed were expressed biallelically, Nanog invariably showed monoallelic 
expression in 2-, 4- and 8-cell-stage embryos (Fig. la, b and Sup- 
plementary Movie 1). We next addressed whether the monoallelic 


expression of Nanog is random or imprinted, by using allele-specific, 
single-cell PCR with reverse transcription (RT-PCR) on 8-cell- 
stage embryos from C57BL/6J] < Mus musculus castaneus crosses. 
Consistent with our RNA-FISH results, most blastomeres expressed 
Nanog monoallelically (Fig. 1c and Supplementary Fig. 1). We observed 
no bias in Nanog expression from either the C57BL/6] or the M. musculus 
castaneus allele (Fig. 1c). We conclude that Nanog expression is 
monoallelic and random during early pre-implantation development. 

Nanog expression becomes restricted to the inner cell mass (ICM) of 
the blastocyst'”, where it is needed for the naive epiblast to attain 
pluripotency’. As expected, the proportion of cells expressing Nanog 
and Oct4 declined gradually as the expression of these genes became 
restricted to inner cells (Fig. 2a). Remarkably, the proportion of 
biallelism in Nanog-expressing cells increased progressively from 2% 
at the 8-cell stage to 70% of the inner cells in the late blastocyst (Fig. 2b 
and Supplementary Movie 2). This switch to biallelism of Nanog is in 
contrast to the consistent biallelic expression of Oct4 and Actb 
(Fig. 2a). Sequential RNA-FISH and DNA-FISH confirmed the 
biallelic expression of Nanog (Supplementary Fig. 1). The ICM of 
the late blastocyst contains two lineages: the extra-embryonic 
primitive endoderm, and the ‘ground-state’ pluripotent epiblast®*, 
which gives rise to the embryo. Inner cells expressing Nanog 
biallelically also express Oct4 but not Gata4, a primitive endoderm 
marker’, and therefore are epiblast cells (Fig. 2c). This finding suggests 
that Nanog is predominantly expressed from both alleles in the 
naive epiblast. Furthermore, Nanog expression gradually reversed to 
monoallelic expression after implantation (Supplementary Fig. 2), in 
line with the downregulation of Nanog and concomitant loss of 
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Figure 1 | Nanog expression is monoallelic in early embryos. a, RNA-FISH 
for Nanog and Oct4 in 4- and 8-cell-stage nuclei. Scale bar, 2 um. b, Proportion 
of allelic expression for each gene in 2-cell-stage (2C), 4-cell-stage (4C) and 
8-cell-stage (8C) blastomeres. n, number of cells analysed. c, Summary of 
single-cell RT-PCR data for Nanog in 8-cell-stage blastomeres, according to 
biallelic, monoallelic (C57BL/6J or M. musculus castaneus (Castaneus)) and no 
expression (none). 
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Figure 2 | Nanog shows biallelic expression in the naive, pluripotent 
epiblast. a, RNA-FISH for Nanog, Oct4 and Actb in 8-cell-stage (8C), 16-cell- 
stage (16C) and 32-cell-stage (32C) embryos (before cavitation), and early 
blastocysts (EBLs; E3.5) and late blastocysts (LBLs; E4.25). Data are shown as 
the percentage of Nanog-expressing cells (or Oct4- or Actb-expressing cells) 
relative to all cells analysed. b, Quantification of cells showing monoallelic or 
biallelic Nanog expression in Nanog-expressing 8-cell-stage blastomeres and 
inner cells at the 32-cell, EBL and LBL stages. *, P = 8.8 X 10°; 

** P= 31X10 7; ***, P< 1.0 X 10 | (Fisher’s exact test). c, RNA-FISH in 


pluripotency’. Thus, Nanog expression switches transiently from 
monoallelic to biallelic during the formation of the pluripotent epiblast 
in vivo, coinciding with the completion of the reprogramming process. 

We next investigated the allelic expression of Nanog in ES cells. 
When cultured in medium containing serum and LIF, 60-70% of ES 
cells express Nanog (refs 10, 11) (Supplementary Fig. 4). Under these 
conditions, Nanog is predominantly expressed from a single allele 
(Fig. 2d, e and Supplementary Movie 3). By contrast, all of the other 
genes analysed showed biallelic expression, including other pluripotency- 
associated genes (Oct4, Sox2, KIf2 and Fgf4), heterogeneously expressed 
genes (Pecam1, Bmp4 and Rex1 (also known as Zfp42)) and housekeep- 
ing genes (Actb and Gapdh) (Fig. 2d, e). Allele-specific, single-cell RT- 
PCR with hybrid ES cells showed that Nanog expression is independent 
of parental origin, similarly to the early embryo (Supplementary Fig. 3). 
Thus, Nanog shows random monoallelic expression both in early pre- 
implantation embryos and in ES cells. 

The proportion of monoallelic versus biallelic Nanog expression in 
the naive epiblast is significantly different from that in ES cells, the 
latter showing much lower biallelism (Fig. 2b, e). We hypothesized that 
this could reflect the more homogeneous expression of Nanog in the 
naive epiblast compared to ES cells®*. Ground-state pluripotency 
can be established in vitro through pharmacological inhibition (with 
two inhibitors (2i)) of MEK and GSK3B”. ES cells cultured with 2i 
show increased levels of Nanog messenger RNA*” and homogeneous 
distribution of NANOG protein’ (Supplementary Fig. 4). Strikingly, 
we found that culturing ES cells in 2i resulted in a significant increase 
in biallelic Nanog expression (Fig. 2f, g, Supplementary Fig. 3 and 
Supplementary Movie 4). We confirmed Nanog biallelic expression 
with sequential RNA-FISH and DNA-FISH (data not shown). To 
address which signalling pathway(s) regulates the allelic expression 
of Nanog, we used combinations of pharmacological inhibitors®’*”*. 
We found that blocking fibroblast growth factor (FGF)—extracellular 
regulated kinase (ERK) signalling was sufficient to trigger the biallelic 
expression of Nanog (Supplementary Fig. 4). The naive epiblast and ES 
cells are thought to be maintained in a ground state by preventing the 
differentiation that is induced through FGF-ERK signalling*’”. Thus, 
collectively these results suggest that ground-state pluripotency might 
be achieved through the activation of the second allele of Nanog. 
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ICM cells of the LBL. The dashed line delineates individual ICM cells. d, ES cells 
cultured with serum and LIF display primarily monoallelic Nanog expression. 
A representative RNA-FISH image of ES cells is shown. e, Distribution of 
monoallelism and biallelism of Nanog and the indicated genes in ES cells 
cultured with LIF. f, A representative RNA-FISH image of nuclei from ES cells 
cultured with 2i/LIF, showing biallelic expression of Nanog. g, Treatment of ES 
cells with 2i/LIF leads to increased biallelic Nanog expression. The proportion 
of allelic expression of Nanog, Oct4, and Actb is shown for ES cells cultured with 
LIF or 2i/LIF. a, b, g, n, number of cells analysed. ¢, d, f, Scale bars, 2 um. 


The above changes in Nanog allelic expression upon pharma- 
cological treatment suggest that the Nanog alleles are dynamically 
regulated. To visualize the dynamics of activation of the two Nanog 
alleles in individual cells, we generated an ES-cell reporter in which a 
gene encoding destabilized Turbo green fluorescent protein 
(TurboGFP) was inserted immediately downstream of the NANOG- 
coding region in one Nanog allele and a gene encoding destabilized 
mCherry was inserted similarly into the other Nanog allele (Fig. 3a and 
Supplementary Fig. 5). Both fluorescent proteins dissociate from 
NANOG by self-cleavage of a 2A peptide and do not alter NANOG 
function. We observed heterogeneous distribution of TurboGFP or 
mCherry in ES cells grown in LIF, confirming that the monoallelic 
expression of Nanog is random (Fig. 3b). This is in contrast to the 
homogeneous distribution in ES cells grown in 2i, in which most cells 
expressed both fluorescent proteins, reflecting biallelic expression of 
Nanog (Fig. 3b). Using time-lapse analysis, we observed dynamic fluc- 
tuation of Nanog expression, in agreement with previous reports'°”’ 
(Supplementary Fig. 6 and Supplementary Movie 5). Moreover, we 
found that Nanog expression can switch between alleles. In most cases, 
the allelic switch occurred through an intermediate state of biallelic 
Nanog expression or no Nanog expression over several cell cycles. Ina 
minority of cases (2%), rapid allelic switching occurred during a single 
cell cycle (Fig. 3c and Supplementary Fig. 6). This switching is in 
contrast to other monoallelic genes, including imprinted genes, for 
which the inactive status of an allele is stably maintained’’. 

Given the potential importance of activation of the second Nanog 
allele in establishing ground-state pluripotency, we next addressed the 
chromatin signatures underlying biallelic activation. We asked whether 
the allelic expression of Nanog obeys any epigenetic feature. We found 
that Nanog replicated asymmetrically in ES cells cultured with LIF 
alone—a feature of monoallelically expressed genes—but changed its 
replication pattern towards symmetric replication in ES cells cultured 
with 2i and LIF (denoted 2i/LIF) (Fig. 3d, e). This pattern is in contrast 
to the invariable asymmetric replication of imprinted genes (Snurfand 
H19; Fig. 3d, e) and is consistent with changes in allelic Nanog expres- 
sion. Replication timing is a distinctive epigenetic fingerprint”'®”’, thus 
suggesting that the allelic regulation of Nanog could be associated with 
epigenetic signatures. Indeed, monoallelic expression is accompanied 
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Figure 3 | The switch to biallelic expression of Nanog is accompanied by 
changes in replication timing and increased recruitment of mediator and 
cohesin. a, Schematic of the Nanog knock-in reporter NGR. A PEST motif in 
the carboxy terminus of the fluorescent proteins allows monitoring of dynamic 
Nanog expression. iHyg, internal ribosome entry site (IRES) hygromycin; iNeo, 
IRES neomycin; mChe, mCherry; NLS, nuclear localization signal; tGFP, 
TurboGFP. b, Representative image of NGR ES cells cultured with LIF or 2i/ 
LIF. Scale bar, 10 tm. ¢, The incidence of allelic switching of Nanog expression 
in ES cells. Cells were classified into four groups: monoallelic (TurboGFP- 
positive, green), monoallelic (mCherry-positive, red), biallelic (TurboGFP- and 
mCherry-positive, yellow) and no expression (black). The proportion of cells 
undergoing a transition between these four groups during a single cell cycle is 
indicated. Overall, 47% of cells showed a colour change in this period. n, 
number of cells analysed. d, The asymmetric replication of Nanog in ES cells 
cultured with LIF changes to symmetric replication upon treatment with 2i. 
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by allele-specific DNA methylation and/or histone modifications 
However, we found that ES cells lacking all three DNA methyltrans- 
ferases’” had similar Nanog allelic expression to wild-type ES cells 
(Fig. 3f), indicating that the allelic regulation of Nanog is independent 
of DNA methylation, unlike that of imprinted genes”. 

Next, we assessed allelic histone modifications on Nanog by using 
chromatin immunoprecipitation (ChIP), taking advantage of the fact 
that ES cells grown in LIF express Nanog mainly in a monoallelic manner 
and those grown in 2i/LIF primarily have two active alleles. ChIP showed 
a significantly higher enrichment of H3K4me3, a modification of active 
chromatin, at the transcription start site (TSS) of Nanogin ES cells grown 
in 2i/LIF than those grown in LIF alone (Fig. 3g). This increase is in line 
with the higher biallelic expression in 2i/LIF than LIF alone. The 
H3K4me3 enrichment at the Nanog TSS was approximately twofold 
in 2i/LIF, possibly suggesting that H3K4me3 is enriched at the active 
Nanog allele but absent at the inactive one in ES cells expressing Nanog 
monoallelically. We observed significantly higher recruitment of the 
mediator subunit MED12, as well as the cohesin-loading factor 
NIPBL, to the Nanog enhancer in ES cells grown in 2i/LIF than in ES 
cells grown in LIF alone (Fig. 3g). This increased recruitment was accom- 
panied by increased RNA polymerase II initiation and elongation upon 
2i treatment (Supplementary Fig. 7). Indeed, changes in chromatin 
architecture mediated through simultaneous binding of mediator and 
cohesin to the Nanog locus facilitate looping between the enhancer and 
promoter, resulting in Nanog gene activation”. Collectively, our data 
suggest that Nanog allelic expression might be actively regulated through 
chromosomal changes over the Nanog locus. 

Mice that are heterozygous for Nanog (Nanog*’~) are born at 
normal Mendelian ratios'. We hypothesized that if Nanog biallelic 
expression is important for the formation of the ICM and/or for the 
acquisition of ground-state pluripotency, then Nanog*’~ embryos 
may show developmental defects at the onset of Nanog biallelic 


Distance to TSS (kb) ie) Gate niente 

DE 
The cell nuclei were classified as single/double (SD), single/single (SS) and 
double/double (DD) according to DNA-FISH signals*. n, number of nuclei 
analysed. *, P<4 x 10-7; **, P< 1.4 X 10° (Fisher’s exact test). 
e, Representative image of DNA-FISH for Nanog (arrowheads) and Oct4 in ES 
cells cultured with LIF or 2i/LIF. Scale bar, 2 um. f, Nanog allelic expression is 
unaffected in the absence of DNA methyltransferase activity. Quantification of 
RNA-FISH for Nanog in wild-type (WT) ES cells and ES cells lacking all three 
DNA methyltransferases (TKO) cultured with LIF or 2i/LIF. g, ChIP for 
H3K4me3, MED 12 or NIPBL along the Nanog locus (black line, top) in ES cells 
cultured with LIF or 2i/LIF. The position of the ChIP amplicons is depicted by 
the thick boxes below the line, the TSS by an arrow, the first exon by the black 
box on the line, and the distal enhancer by the blue box on the line. The Oct4 
promoter region (Oct4 Pro) and distal enhancer (Oct4 DE) were positive 
controls (right)’°. The mean = s.d. of three independent biological replicates is 
shown. 


expression. To address this, we counted ICM cells in individual 
Nanog*'~ blastocysts by using an anti-OCT4 antibody. Freshly collected 
embryonic day 3.5 (E3.5) Nanog*'~ and Nanog*’* blastocysts had 
similar ICM cell numbers (Supplementary Fig. 8a). This finding is in 
agreement with previous reports®*! and with our observations 
documenting Nanog monoallelic expression before this embryonic 
stage. By contrast, we found significantly fewer ICM cells in 
Nanog*'~ blastocysts as the embryo progressed through development 
to form the nascent epiblast (Fig. 4a), which also showed increased 
apoptosis in the ICM (Supplementary Fig. 8b). No significant differ- 
ence in the number of NANOG-positive cells was observed between 
wild-type (Nanog*'*) and heterozygous (Nanog’’~) embryos 
(Fig. 4b, c). However, Nanog*’~ embryos showed delayed primitive 
endoderm formation (Fig. 4b, c). Indeed, primitive endoderm forma- 
tion depends on a functional epiblast in a non-cell-autonomous 
manner”, suggesting that the epiblast in Nanog*’~ embryos is func- 
tionally impaired. It should be noted that the reduction in primitive 
endoderm cells does not fatally compromise development, owing to 
the regulatory capacity of the early post-implantation embryo. Thus, 
we conclude that the biallelic expression of Nanog is necessary for the 
timely maturation of the ICM into a functional epiblast and the 
accompanying primitive endoderm. 

NANOG has a dose-sensitive action in reprogramming®. Over- 
expression of NANOG is sufficient to prevent ES-cell differentiation’; 
conversely, Nanog*! ~ ES cells show spontaneous differentiation’, 
suggesting that the tight regulation of Nanog levels is crucial for both 
reprogramming and differentiation. We found that this dose regu- 
lation occurs principally at the allelic level. Monoallelic expression is 
predominant during the early reprogramming phase in cleavage stages, 
but it switches to biallelic expression as the ICM matures into the 
pluripotent epiblast (Fig. 4d). Monoallelic Nanog expression in ES cells 
may be representative of the late ICM or the early post-implantation 
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Figure 4 | Biallelic expression of Nanog is required for the timely 
maturation of the ICM derivatives. a, Reduced proportion of ICM derivatives 
in Nanog*'~ blastocysts compared to Nanog''* blastocysts. Mid and late 
blastocysts were staged according to their total cell number. Data are expressed 
as the percentage of ICM cells per embryo. The box plot represents the 
ninetieth, seventy-fifth, fiftieth (median), twenty-fifth and tenth percentiles, as 
well as the mean (black dot). b, Delayed primitive endoderm (PE) formation in 
Nanog'’~ blastocysts. Embryos were staged in groups according to their total 
cell number. The number of PE (GATA4-positive) and epiblast (EPI; NANOG- 
positive) cells was determined. c, Representative maximal confocal projections 
of late blastocysts, showing a reduced proportion of ICM derivatives in 
Nanog*'~ embryos. Scale bar, 10 pm. d, Model for the allelic regulation of 
Nanog and the acquisition of ground-state pluripotency. Nanog is expressed 
monoallelically during the cleavage stages and up to the early blastocyst. 
Biallelic expression of Nanog accompanies the formation of the fully 
reprogrammed, pluripotent, epiblast. Biallelic expression of Nanog also occurs 
in vitro under conditions that mimic the ground-state pluripotency of the naive 
epiblast (ES cells cultured with 2i/LIF). 


epiblast, in which there is a switch back to monoallelic expression. The 
expression of Nanog can switch between alleles, unlike most known 
monoallelic genes, raising the possibility that the regulation of Nanog 
allelic expression could be unique. Our evidence that such regulation is 
independent of DNA methylation and does not seem to involve typical 
repressive modifications (H3K27me3, H3K9mel, H3K9me2 and 
H3K9me3; data not shown) also supports this possibility. The 
ground-state pluripotency is facilitated by the suppression of FGF- 
ERK signalling*’*. Our work suggests that the allelic regulation of 
Nanog is involved in this process and that the increased expression of 
Nanog that accompanies the establishment of ground-state pluripotency 
is achieved through activation of its second allele. The dose regulation of 
Nanog through allelic switching could also contribute to promoting 
heterogeneity, which could provide a window of opportunity both for 
establishing ground-state pluripotency in the epiblast and for lineage 
segregation towards the primitive endoderm. We propose that the 
regulation of Nanog at the allelic level provides a novel mechanism 
to establish ground-state pluripotency in the naive epiblast. Thus, our 
work has uncovered an unexpected role for the allelic regulation of key 
transcription factors, as a novel mechanism in stabilizing pluripotency. 


METHODS SUMMARY 

The embryos were from F, (C57BL/6J X CBA/H) or C57BL/6J < M. musculus 
castaneus crosses. The blastocysts from Nanog“"’* x C57BL/6] crosses” were 
genotyped after immunostaining (47 Nanog‘’* and 49 Nanog*’~ were analysed). 
The ES-cell lines and culture conditions are described in the Methods. Briefly, 
when referring to LIF, ES cells were cultured in 15% FCS and LIF; when referring 
to 2i/LIF, ES cells were cultured in 15% FCS, LIF, 3 uM CHIR99021 and 1 uM 
PD0325901. Embryos and ES cells were fixed with 4% paraformaldehyde in PBS for 
immunostaining. Images were acquired on a TCS SP5 confocal microscope (Leica). 
Image analysis was performed with the programs ImageJ, LAS AF Lite (Leica) 
or Imaris (Bitplane). For single-cell RT-PCR analysis, individual 8-cell-stage 
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blastomeres were mechanically dissociated, and the cells were collected in 3.1 pl 
Cell Lysis Buffer (Ambion) and processed for RT-PCR then digested with AvalI. All 
primer sequences are listed in Supplementary Table 1. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Embryo collection and culture. Embryos were collected from ~6 week-old F, 
(C57BL/6] X CBA/H) superovulated females crossed with F; males. To obtain F; 
hybrid embryos, superovulated C57BL/6J females were crossed with M. musculus 
castaneus males. Embryos were collected at the following times after human 
chorionic gonadotrophin injection: the 2-cell stage (48 h), 4-cell stage (54h), 8-cell 
stage (68h), 16-cell stage (80h), 32-cell stage (90h), early blastocyst stage (98 h) 
and late blastocyst stage (114h). To obtain late blastocysts, freshly collected early 
blastocysts were cultured in KSOM for 16h. To obtain E4.5, E4.75 and E5.0 stage 
embryos, embryos were collected from naturally mated females. The analysis of 
Nanog''~ embryos was done using the Nanog’ line”. Blastocysts from 
Nanog*! ~ X C57BL/6] (+/+) crosses were processed for immunostaining and 
reconstructed in three dimensions using Imaris software. Individual ICM cells 
were counted after immunostaining with an anti-OCT4 antibody. The number of 
primitive endoderm and epiblast cells was determined by counting individual cells 
after immunostaining for GATA4 and NANOG. The total number of cells was 
determined by counting nuclei stained with 4',6-diamidino-2-phenylindole 
(DAPI). Individual blastocysts were genotyped after confocal acquisition as 
described previously”. Primer sequences for genotyping are listed in 
Supplementary Table 1. All experiments were performed in accordance with the 
current legislation in France and the approval of the Regional Ethics Committee 
(ComEth’s). 

ES-cell culture. Mouse ES-cell lines, E14, BC1 and TKO (the latter of which lacked 
all three DNA methyltransferases), were cultured in DMEM with GlutaMAX 
(Invitrogen) containing 15% FCS, LIF, 1mM sodium pyruvate, penicillin/ 
streptomycin and 0.1mM 2-mercaptoethanol. The treatment of ES cells with 
inhibitors was performed using 3 4M CHIR99021 (a GSK3f inhibitor), 1 1M 
PD0325901 (a MEK inhibitor), 5 uM PD173074 (an FGF receptor tyrosine kinase 
activity inhibitor) and 3 uM PD184352 (a MEK inhibitor). The hybrid ES-cell line 
(BC1) was derived from E3.5 embryos from C57BL/6 X M. musculus castaneus 
crosses as described previously”. 

Gene targeting. Mouse ES cells from the BD 10 line (C57BL/6) were electroporated 
with a first targeting vector consisting of 2A-3 x TurboGFP-NLS-PEST-IRES-Neo. 
The resultant G418-resistant ES-cell (Nanog“”’*) clone was subsequently sub- 
jected to a second targeting with a vector containing 2A-3 x mCherry-NLS-PEST- 
IRES-Hyg. Genomic DNAs from both G418- and hygromycin-B-resistant colonies 
were screened for homologous recombination into each Nanog allele by PCR and 
Southern blotting. 

Immunostaining. Embryos or ES cells were fixed with 4% paraformaldehyde in 
PBS for 20min at room temperature. After washing with PBS, embryos were 
permeabilized with 0.5% Triton X-100 in PBS for 10 min and then incubated in 
blocking solution (0.2% BSA in PBS) for 30 min. Primary antibodies were anti-OCT4 
(611202, BD Pharmingen), anti: NANOG (MLC-51, eBioscience) and anti-GATA4 
(sc-1237, Santa Cruz Biotechnology). After incubation in blocking solution contain- 
ing primary antibodies for 1 h, cells or embryos were washed three times with 0.01% 
Triton X-100 in PBS for 5 min each and then incubated in blocking solution contain- 
ing secondary antibodies labelled with Alexa fluorophores (Invitrogen), Cy3, Cy5 
(Jackson ImmunoResearch) or Kodak X-SIGHT 640 (Carestream Health). After 
washing with PBS, mounting was done in VECTASHIELD (Vector Labs). Images 
were collected on a TCS SP5 confocal microscope (Leica). For all images, DAPI is 
shown in blue. Image analysis was performed using the software ImageJ, LAS AF Lite 
(Leica) and Imaris (Bitplane). 

Whole mount RNA-FISH. After removal of the zona pellucida by incubation in 
acidic Tyrode’s solution (Sigma), embryos were washed twice with PBS and then 
incubated in fixative containing 4% paraformaldehyde/1 PBS for 20 min at room 
temperature. Embryos were permeabilized with 0.5% Triton X-100 in fixative for 
10 min. After washing with PBS three times, embryos were pre-hybridized in a 2 il 
drop of hybridization buffer covered with mineral oil on a 35-mm glass bottom 
dish for 30 min at 50 °C. The hybridization buffer consisted of 50% formamide, 
10% dextran sulphate, 2SSC, 1 pg ul”! Cotl DNA, 1 Lig ul’ yeast tRNA 
(Roche), 2mM vanadyl ribonucleoside complex (New England BioLabs), 1 mg 
ml ' polyvinyl pyrrolidone (PVP), 1mM EDTA, 0.1% Triton X-100 and 1 mg 
ml | BSA. Embryos were then transferred to a 2 il drop of hybridization buffer 
containing 5ng yl * fluorescent probes and incubated at 50°C overnight. After 
washing three times with 2XSSC, 0.1% Triton X-100 and 1 mg ml! PVP at 50°C 
for 10 min, embryos were then subjected to a gradient series (20%, 40%, 60%, 80% 
and 100%) of VECTASHIELD containing DAPI. Images were acquired on a TCS 
SP5 confocal microscope with a 63 X glycerol immersion objective lens. For RNA- 
FISH of ES cells, ES cells grown on gelatin-coated coverslips were fixed and 
permeabilized as described above. After briefly washing with PBS, the coverslip 
was dehydrated in 70% ethanol and 100% ethanol for 10 min each and was sub- 
sequently air dried. Hybridization buffer containing the corresponding probes was 
applied to the coverslip. Subsequent hybridization, washing and mounting were 
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performed as described above. The plasmids, BACs and fosmids used as probes 
for RNA-FISH were as follows: PBSII-Nal234 for Nanog, GOF6.1 for Oct4, 
G135P62831G1 and G135P630G2 for Gata4, RP23-476J7 or G135P6579H5 for 
Actb, G135P60852Bé4 for Sox2, G135P65370D8 for K/f2, G135P600983F2 for Fef4, 
G135P60839B3 for Bmp4, RP23-129L1 for Gata6, G135P65026G5 for Hsp70.1, 
G135P604053D8 for Pecam1, G135P69364G7 for Stella, G135P602899G3 for 
Rex1 and G135P60163E2 for Gapdh. The PBSII-Na1234 plasmid contains the 
NANOG-coding region without any repetitive sequences, as determined by 
RepeatMasker. BAC and fosmid probes were obtained from the BACPAC 
Resources Center. The specificity of all probes was confirmed by PCR using 
specific primers. Probes were labelled with handmade Alexa 488-dATP, 
TAMRA-dATP, ATTO 647N-dATP using a nick translation kit (Roche) and 
purified with a QIAquick PCR Purification Kit (QIAGEN). Images were analysed 
by the software ImageJ and LAS AF Lite. For all of the images, DAPI is shown in 
blue. It should be noted that the monoallelic expression of Nanog was independent 
of the genetic background, as it was reproducible in all of the mouse strains 
analysed (F, (C57BL/6J x CBA/H), F, (C57BL/6J X M. musculus castaneus), 
CD1 and C57BL/6J; data not shown). 

Statistical analysis of RNA-FISH. Only interphase cells were taken into account 
for all of the analyses. Images were analysed across three-dimensional planes by using 
the software ImageJ and LAS AF Lite. The RNA-FISH signals, which were detected 
over three sequential Z-sections were judged as positive signals to distinguish back- 
ground noise. Single RNA-FISH spots or two adjacent spots within 1 jm, which 
might result from DNA replication, were judged as monoallelism. Cells with two, 
three or four separate RNA-FISH spots were scored as biallelism. Statistical analysis 
was done using Fisher’s exact test, and the number of cells analysed in all experiments 
is indicated in the corresponding figure. 

DNA-FISH. DNA-FISH was performed as previously described**. Probes for 
DNA-FISH were as follows: RP23-117123 for Nanog, RP23-213M12 for Oct4, 
RP23-476J7 for Actb, RP23-106M4 for Cdx2, RP23-239M21 for Snurf and 
G135P602165C11 for H19. For all DNA-FISH images, DAPI is shown in blue. 
Allele-specific, single-cell RT-PCR. Individual 8-cell-stage blastomeres from 
hybrid embryos or single hybrid BC1 ES cells were used for this assay. Embryos 
or ES cells were treated with 0.25% trypsin and 1 mM EDTA at 37°C for 5 min 
with gentle pipetting. Single cells were manually collected by mouth pipette aided 
by a finely pulled glass tip, directly into 3.1 tl Cell Lysis Buffer (Ambion) contain- 
ing RNase inhibitor (Invitrogen) and 73 nM specific primers for Nanog and Actb. 
The single-cell samples were snap frozen in liquid nitrogen and stored at —80°C 
until use. Lysates were incubated at 65°C for 5 min and placed on ice for 5 min. 
Reaction mixture (7 ul) (2 pl 5XRT buffer, 2 pl 2.5mM dNTPs, 0.1 pl RNase 
inhibitor, 0.2 ul Transcriptor (Roche) and 2.7 pl water) was added to each tube. 
Reverse transcription was performed at 25 °C for 10 min, 37 °C for 15 min, 55 °C 
for 30min and 85°C for 5 min. The reaction mixtures were subjected to a first 
round of multiplex PCR with primer mixtures for Nanog and Actb. This first PCR 
reaction mixture (0.5 pl) was subsequently used for a second round of PCR with a 
specific primer set for Nanog or Actb. Pwo SuperYield DNA Polymerase (Roche) 
was used for PCR reactions. After the second PCR round, PCR products for Nanog 
were digested with the AvalI restriction enzyme and analysed by acrylamide gel 
electrophoresis. Primer sets for Nanog- and Cdkn1-flanking single nucleotide 
polymorphism (SNP) sites are listed in Supplementary Table 1. SNPs in the 
NANOG-coding region were identified by using The Jackson Laboratory's 
Mouse SNP database (http://phenome.jax.org/SNP/). SNPs in Cdkn1 are described 
elsewhere”. Primer sequences are listed in Supplementary Table 1. 

Quantitative (real-time) PCR. RNA was extracted from ES cells with an RNeasy 
Mini Kit (QIAGEN) and reverse transcribed with Transcriptor (Roche). Real-time 
PCR was performed with SYBR Green JumpStart Taq ReadyMix (Sigma) on a 
LightCycler 480 Real-Time PCR System (Roche). The relative expression level of 
each gene was normalized to the Gapdh expression level. Primer sequences are 
listed in Supplementary Table 1. 

ChIP assay. ChIP assays were performed with ES cells as previously described’*. 
Primer sequences are listed in Supplementary Table 1. The antibodies used were 
anti-histone H3 (ab1791, Abcam), anti-H3K4me3 (pAV-MEHAHS-024, 
Diagenode) anti-MED12 (A300-774A, Bethyl Laboratories), anti-NIPBL (301- 
779A, Bethyl Laboratories), anti-RNA polymerase II (PolII) (N-20, Santa Cruz 
Biotechnology), anti-PollI CTDSer2P (ab5095, Abcam) and anti-PollI CTDSer5P 
(ab5131, Abcam). The enrichment of histone modifications was quantified by 
real-time PCR as described above and normalized to histone H3. For MED12, 
NIPBL, PollI, PollI CTDSer2P and PolIICTDSer5P, the percentage input was 
calculated for each ChIP fraction. 

Time-lapse imaging. NGR ES cells were cultured in Knockout DMEM (Invitrogen) 
supplemented with 15% Knockout Serum Replacement (Invitrogen), LIF, non- 
essential amino acids, glutamine, gentamicin and 2-mercaptoethanol, under condi- 
tions of 5% CO, and 95% air. The cells were grown on Culture-Insert (Ibidi) placed 
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on laminin-511 (Biolamina)-coated 3-cm glass-bottomed dishes (Mattek). The 
dishes were placed in an incubation chamber (Tokai Hit) on the microscope stage 
at 37 °C. An inverted microscope (Leica) attached to a Nipkow disk confocal micro- 
scope (Yokogawa Electric) and EMCCD camera (iXon, Andor Technology) was 
controlled with iQ software (Andor Technology). TurboGFP and mCherry were 
excited with 488-nm and 560-nm lasers, respectively. Images for each colour (green 
or red) were acquired across 17.5 um (7 Z-planes) every 20 min for ~50h. 

Analysis of live-cell imaging data. Time-lapse data were analysed using the 
Image] software plug-in Circadian Gene Expression (CGE)”. Fluorescent images 
were merged and used for the segmentation of nuclei. Cells (n = 200) were randomly 
selected, and the fluorescence intensities for TurboGFP and mCherry in each 
nucleus were measured in each time frame. Background intensity was subtracted 
from the mean fluorescence intensity and plotted as the expression level against 
time during a single cell cycle (~10 h) as shown in Supplementary Fig. 6. Cells that 


were positive for TurboGFP, positive for mCherry, positive for both or negative for 
both were scored as monoallelic (green), monoallelic (red), biallelic (yellow) or 
non-expressed (black), respectively. Based on the kinetics of TurboGFP and 
mCherry during a single cell-cycle, the proportion of cells showing a transition 
between these four groups was calculated. 
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The microRNA miR-34 modulates ageing and 
neurodegeneration in Drosophila 
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Human neurodegenerative diseases have the temporal hallmark of 
afflicting the elderly population. Ageing is one of the most prominent 
factors to influence disease onset and progression’, yet little is known 
about the molecular pathways that connect these processes. To 
understand this connection it is necessary to identify the pathways 
that functionally integrate ageing, chronic maintenance of the brain 
and modulation of neurodegenerative disease. MicroRNAs (miRNA) 
are emerging as critical factors in gene regulation during develop- 
ment; however, their role in adult-onset, age-associated processes is 
only beginning to be revealed. Here we report that the conserved 
miRNA miR-34 regulates age-associated events and long-term brain 
integrity in Drosophila, providing a molecular link between ageing 
and neurodegeneration. Fly mir-34 expression exhibits adult-onset, 
brain-enriched and age-modulated characteristics. Whereas mir-34 
loss triggers a gene profile of accelerated brain ageing, late-onset 
brain degeneration and a catastrophic decline in survival, mir-34 
upregulation extends median lifespan and mitigates neurodegenera- 
tion induced by human pathogenic polyglutamine disease protein. 
Some of the age-associated effects of miR-34 require adult-onset 
translational repression of Eip74EF, an essential ETS domain tran- 
scription factor involved in steroid hormone pathways. Our studies 
indicate that miRNA-dependent pathways may have an impact on 
adult-onset, age-associated events by silencing developmental genes 
that later have a deleterious influence on adult life cycle and disease, 
and highlight fly miR-34 as a key miRNA with a role in this process. 

Recent evidence reveals that miRNA pathways are important in the 
adult nervous system, notably in the maintenance of neurons and in the 
regulation of genes and pathways associated with neurodegenerative 
disease”. Given these findings, we considered that there may be a 
fundamental role for select miRNAs in ageing. We examined flies 
carrying a hypomorphic mutation in loquacious (logs), a key gene in 
fly miRNA processing* (Supplementary Fig. 1a). Flies bearing the 
logs! 00791 mutation were viable, but detailed examination indicated a 
significantly shortened lifespan (Supplementary Fig. 1b). Further ana- 
lysis indicated that logs” flies showed late-onset brain morphological 
deterioration: although normal as young adults, by 25 days logs'°””! 
flies developed large vacuoles in the retina and lamina of the brain 
(Supplementary Fig. 1c). Although developmental processes may con- 
tribute to shortened lifespan, the adult-onset brain degeneration of 
logs”! mutants indicated that one or more specific miRNAs may 
be critically involved in age-associated events impacting on long-term 
brain integrity. 

To explore this question, we determined whether specific miRNAs 
displayed age-modulated expression in the brain. RNA was isolated 
from dissected brains of adult flies of young (3 days), mid (30 days) and 
old time points (60 days). Using an array for Drosophila miRNAs, 29 
were expressed in the adult brain (Fig. 1a). Whereas most miRNAs 
maintained a steady level or decreased with age, one miRNA, mir-34, 


increased (Fig. la). Small RNA northern blot analysis confirmed that 
mir-34 expression was barely detectable during development, but 
became high in the adult and was further upregulated with age (Sup- 
plementary Fig. 2a, b). Expression of mir-34 was affected in logs”?! 
flies (Supplementary Fig. 1d). miR-34 falls into a category of 
Drosophila miRNAs whose processing requires the exoribonuclease 
nibbler (nbr)**. In the adult, mature miR-34 displayed three major 
differentially sized forms (24 nucleotides, 22 nucleotides and 21 
nucleotides) with a uniform 5’ end, descending by single nucleotides 
at the 3’ end which result from nbr-mediated trimming; only isoform c 
became upregulated with age (Supplementary Fig. 2c and Fig. 1b, c; see 
also refs 5-7). 

miR-34 is a markedly conserved miRNA, with orthologues in fly, 
Caenorhabditis elegans, mouse and human showing identical seed 
sequence (Supplementary Fig. 2d). To define miR-34 function, flies 
deleted for the gene were generated (Supplementary Fig. 3a). The 
resulting mir-34 mutant flies retained normal wild-type expression 
of neighbouring genes, but selectively lacked mir-34 (Supplementary 
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Figure 1 | Drosophila mir-34 expression is upregulated with age. a, Heat 
map of fold-change of Drosophila miRNAs in brains aged 3 days, 30 days and 
60 days. Twenty-nine miRNAs (shown) were flagged present out of a total of 
seventy-eight. One-way analysis of variance defined significance for each 
miRNA over all time points (***P < 0.001; = 3 replicates). Genotype: iso31. 
b, Fly miR-34 isoform c shows age-modulated expression in fly heads. Left 
panels: miR-34 shows three major mature forms (labelled a, b and c), but only 
isoform c increases with age. Right panels: quantification of miR-34 isoforms 
with age. n = 3 independent experiments; signal density of all isoforms 
normalized at the same time point to 2S rRNA loading control. *P < 0.01; 
**P < 0.001; one-way analysis of variance, with post test: Tukey’s multiple 
comparison test. Genotype: 5905. c, Sequences of miR-34 isoforms are 
generated through nbr-dependent 3’-end trimming. 
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Fig. 3b, c). To interrogate age-associated phenotypes carefully, we 
generated mir-34 null flies in the same uniform homogeneous genetic 
background (see Methods). mir-34 mutants displayed no obvious 
developmental defects, consistent with its adult-onset expression. 
However, detailed examination of adult animals indicated that mir-34 
mutants, although showing normal adult appearance and early survival, 
displayed a catastrophic decline in viability just after 30 days (Fig. 2a 
and Supplementary Table 4). Analysis of age-associated functions 
revealed that young mutants (3 days) had normal locomotion and stress 
resistance, but by 20 days the mutants had dramatic climbing deficits 
and were markedly stress-sensitive compared to age-matched controls 
(Fig. 2b). Because mir-34 expression was brain-enriched, we also 
examined the brain. Typically, older flies show sporadic, age- 
correlated vacuoles in the brain—a morphological hallmark of neural 
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Figure 2 | miR-34 modulates age-associated processes. a, mir-34 mutant flies 
have a shortened lifespan (control: 64 days median, 90 days maximal lifespan; 
mir-34: 40 days medium, 64 days maximal lifespan; P < 0.0001, log-rank test). 
Mean + s.e.m., nm = 240 male flies per curve. Genotypes: control, 5905; mir- 
34-'~, miR-34 null-1 in 5905 homogenous genetic background. b, mir-34 
mutant flies have late-onset behavioural deficits. Left: for locomotion 
behaviour, mir-34 mutant flies show normal climbing at 3 days. At 20 days, 
50 + 3.4% mir-34 mutant flies fail to climb; in contrast, only 22.1 + 2.4% of 
control flies have defective climbing. Mean ~ s.e.m. of 3 experiments, n = 120- 
140 male flies per experiment. Right panel: for stress resistance, mir-34 mutant 
flies have normal resistance to heat stress at 3 days. mir-34 mutant flies become 
markedly sensitive to heat shock with age, such that at 20 days, only 

27.5 + 3.8% survive after heat stress. In contrast, 76.7 + 9.6% of control flies 
survive after the same treatment. Mean = s.e.m. of 3 experiments, n = 120-140 
male flies. ***P < 0.0001 (two-way analysis of variance). Genotypes as in 

a. c, mir-34 mutant flies show age-associated brain degeneration. Top-left 
panel: mir-34 mutant flies have normal brain morphology at 3 days. Major 
anatomical structures: CB, central brain; La, lamina; Lo; lobula; LoP, lobula 
plate; Me, medulla; Rt, retina. At 3 days, control flies have normal brain 
morphology (not shown), but develop a small number of sporadic vacuoles at 
30 days (top-right panel, arrowheads). Middle panel: aged mir-34 mutants 
(30 days) show striking vacuoles in the medulla (arrows) and other regions of 
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deterioration®. mir-34 mutants were born with normal brain morpho- 
logy, but showed dramatic vacuolization with age, indicative of loss of 
brain integrity (Fig. 2c). Rescue with a 9-kb genomic DNA fragment 
containing mir-34 and its endogenous cis-regulatory elements (Sup- 
plementary Fig. 3a, b) partially restored the age-associated expression 
of mir-34 to mir-34 null flies in the same homogeneous genetic back- 
ground (Supplementary Fig. 3d). Although rescue was not complete, 
indicative of a complexity in genomic elements that regulate mir-34, 
rescue was sufficient to mitigate the mutant effects, indicating that 
miR-34 function normally underlies these age-associated aspects 
(Supplementary Table 1). 

These data indicated that mir-34 mutants were normal as young 
adults, but with age developed deficits reflective of much older animals, 
including loss of locomotion, stress sensitivity and brain deterioration, 
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the brain (arrowheads). Bottom: the number of vacuoles in mir-34 mutants is 
significantly higher than in controls (22.2 + 1.8 versus 1.5 + 0.3 in medulla; 
19.2 + 2.5 versus 7.0 + 0.9 in other regions of the brain; **P < 0.001, one-way 
analysis of variance, with post test: Tukey’s multiple comparison test). 

Mean ~ s.e.m., 1 = 10 independent male fly brains. Genotypes as in a. Scale 
bar: 0.1 mm. d, mir-34 mutant flies have a transcriptional profile indicative of 
accelerated ageing. Top panel: 173 age-correlated probe sets were defined from 
a transcriptional profile of fly brains at 3 days, 30 days and 60 days of age. 
Arrowheads indicate time points (3 days and 20 days) at which mir-34 mutants 
and controls were compared. Genotype: iso31 flies used for transcriptional 
profiles of normal ageing brains. n = 3 biological replicates for each time point. 
P=0.001, false discovery rate (FDR) = 0.062, linear regression model. Bottom 
panel: scatter plot illustrates the relative expression of 173 probe sets, which 
shows a significant difference between mir-34 mutants and age-matched 
controls (P = 0.006, two-sample, paired Wilcoxon test). Whereas the pattern 
for positively correlated probe sets (red), indicated by the contour lines, is 
significantly different (P = 0.0001) between the two genotypes, and tends to 
show higher expression in mir-34 mutants compared to controls, it is not for 
negatively correlated probe sets (blue) (P = 0.9583). Contour lines indicate that 
positively correlated probe sets tend to show higher expression in mir-34 
mutants compared to controls. Genotypes as ina. n = 5 biological replicates for 
each time point. 
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coupled with shortened lifespan. We therefore hypothesized that loss 
of mir-34 accelerated brain ageing. To address this, we transcriptionally 
profiled the fly brain (3 days, 30 days and 60 days) from wild-type 
animals. On the basis of a linear regression model’, we extracted 173 
probe sets from this profile the expression of which was tightly corre- 
lated with the progression of normal ageing (Fig. 2d and Supplemen- 
tary Tables 2 and 3). We next made another set of brain transcriptional 
profiles for mir-34 mutants and controls of matched chronological age 
(3 days and 20 days). We measured relative changes of these probe sets 
between 3 days and 20 days within each genotype, and compared the 
extent of such changes between mir-34 mutants and controls. This 
indicated that the overall pattern of these probe sets was significantly 
different between the two genotypes (P = 0.006, two-sample, paired 
Wilcoxon test; Fig. 2d). In particular, most positively correlated probe 
sets displayed a faster pace of increase in mir-34 mutants compared 
to controls—thus showing accelerated age-associated expression 
changes in mir-34 mutants (Fig. 2d). This result, combined with the 
physiological and histological evidence of more rapid loss of age- 
associated functions, suggested that mir-34 mutants were undergoing 
accelerated brain ageing. 

miRNAs function by binding to the 3’ UTRs of target mRNAs and 
often result in downregulation of protein translation. We therefore 
reasoned that age-associated activities of miR-34 might be mediated 
through silencing of critical targets that have a negative impact on the 
adult animal. miRNA-target prediction algorithms indicated miR-34 
binding sites within the 3’ UTR of the Eip74EF gene; notably, these 
binding sites were conserved in the orthologous Eip74EF genes from 
different Drosophila species (Supplementary Fig. 4a). We confirmed 
the miR-34 interaction through mutations in the seed sequences of the 
predicted miR-34 binding sites in the 3’ UTR of the Eip74EF mRNA 
(Supplementary Fig. 4b). The Eip74EF gene is a component of steroid 
hormone signalling pathways. Although such pathways have generally 
been studied for effects during development, data have implicated 
these pathways in lifespan regulation”®. 

The Eip74EF gene encodes two major protein isoforms, E74A and 
E74B (referred to as the E74A and E74B genes, respectively''); the 
isoforms share the same 3’ UTR (Supplementary Fig. 4a). Northern 
blots indicated that transcription of E74A, but not E74B, persisted in 
adults, overlapping the time period when mir-34 is expressed 
(Supplementary Fig. 4c). Given this, we focused on E74A as a regulated 
target of miR-34 in the adult. Despite robust expression of the mRNA 
transcript, the E74A protein was expressed at low levels in adult heads 
throughout lifespan (Fig. 3a, b and Supplementary Fig. 4d). In flies 
lacking miR-34, E74A protein was markedly increased (Fig. 3b); E74A 
was also de-regulated in the logs”! mutant flies (Supplementary 
Fig. le). Genomic rescue of mir-34 mitigated this de-regulation of 
the E74A protein (Fig. 3c). Fine temporal analysis indicated that the 
E74A protein was highly expressed in young flies, but underwent a 
marked decrease within a 24-h time window (Supplementary Fig. 5). 
This temporal pattern seemed to be mutually exclusive to that of 
miR-34 (see Supplementary Fig. 2a). Moreover, in flies lacking 
miR-34, the downregulation of E74A protein during this critical period 
was dampened (Supplementary Fig. 5). This evidence indicates that 
adult-onset expression of mir-34 functions, at least in part, to attenuate 
E74<A protein expression in the young adult, and maintain that repres- 
sion through adulthood (Supplementary Fig. 4d). 

We next determined whether deregulated expression of E74A protein 
contributed to the age-associated defects in mir-34 mutants. Because 
E74A function is essential during development, with strong mutations 
leading to pre-adult lethality”, we used the mild, but viable, E74A3601805 
hypomorphic mutation (Supplementary Fig. 4a). When the E74A°0018 
mutation was combined with mir-34 mutant flies in the same 
homogenous genetic background, proper regulation of E74A protein 
was partially restored (Fig. 3d), and age-associated defects due to loss of 
mir-34, including shortened lifespan and brain vacuolization, were 
mitigated (Fig. 3e, f; E74A®69!85 mutants alone have a normal lifespan 
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(Supplementary Fig. 6a)). To assess further the adult activity of E74A, 
we upregulated E74A in the adult with an E74A transgene that lacks 
miR-34 binding sites driven by a temperature-sensitive promoter’’. At 
29 °C, these flies demonstrated increased levels of E74A expression in 
the adult (Supplementary Fig. 6b). Notably, these animals also showed 
late-onset brain degeneration (Supplementary Fig. 6c) and a signifi- 
cantly shortened lifespan (Supplementary Fig. 6d). These data indicate 
that deregulated expression of E74A has a negative impact on normal 
ageing, and that one function of miR-34 is to silence E74A in the adult 
to prevent the adult-stage deleterious activity of E74A on brain 
integrity and viability. 

Notably, during the course of these studies, we noted that mir-34 
mutants also displayed a defect in protein misfolding—a molecular pro- 
cess implicated in ageing and common to many human neurodegenera- 
tive diseases'’. Whereas normally with age, the fly brain accumulates a 
low level of inclusions that immunostain for stress chaperones like 
Hsp70/Hsc70, mir-34 mutants showed a marked increase compared 
to control flies of matched age (30 days) (Supplementary Fig. 7). Given 
that mir-34 expression increases with age, and mir-34 loss shows altered 
chaperone accumulation, we tested whether mir-34 expression itself 
is upregulated by stresses like heat shock or oxidative toxins, but found 
no evidence to support this (data not shown). However, given that loss 
of mir-34 caused an increase in protein misfolding, this raised the 
possibility that upregulation of mir-34 expression might mitigate 
disease-associated protein misfolding. In Drosophila, expression of a 
pathogenic ataxin-3 polyglutamine (polyQ) disease protein (SCA3trQ78) 
leads to inclusion formation, a decrease in polyQ protein solubility and 
progressive neural loss'* (Supplementary Fig. 8a). Upregulation of 
mir-34 markedly mitigated polyQ degeneration, such that inclusion 
formation was slowed, the protein retained greater solubility, and neural 
degeneration was suppressed (Fig. 3g, h and Supplementary Fig. 8b-d). 
Lowering E74A expression by heterozygous reduction in flies expressing 
pathogenic polyQ protein revealed a minimal effect (data not shown), 
indicating that E74A may not be a target of miR-34 activity in this 
process. However, our studies with E74A were of necessity limited to 
hypomorphic alleles that may not uncover the full extent of E74A func- 
tion mediated by miR-34. Furthermore, additional targets of miR-34 
may be involved in different aspects of miR-34-directed pathways, 
including disease. 

Given this effect to mitigate disease-associated neural toxicity with 
upregulation of mir-34, and that mir-34 expression naturally increases 
with age, we investigated whether enhanced expression of mir-34 in 
wild-type flies could modulate the ageing process. We increased miR- 
34 dosage in wild-type flies with genomic rescue transgenes, which 
express mir-34 under its endogenous regulatory elements (see 
Supplementary Fig. 2a). Analysis of multiple independent transgenics 
in the same genetic background with that of control indicated that 
upregulation of miR-34 levels with genomic constructs (~20%, 
Supplementary Fig. 3d) promoted median survival rate by ~10% 
compared to wild type (Fig. 3i; other traits, such as the occurrence of 
brain vacuolization, despite being an age-associated phenomenon, are 
sporadic and low in normal flies, thus were difficult to assess). Thus, 
upregulation of mir-34 expression can protect from neurodegenerative 
disease and extend median lifespan. 

Our findings indicate that miR-34 in Drosophila presents a key 
miRNA that couples long-term maintenance of the brain with healthy 
ageing of the organism. miR-34 activity, enhanced by its age- 
modulated expression and processing, is critically involved in silencing 
of the E74A transcript through adulthood and in modulation of pro- 
tein homeostasis with age, as well as in polyQ disease. Select neural cell 
types may be especially vulnerable in ageing and disease’’; miR-34 
function may have an impact on the integrity or activity of these 
systems. Intriguingly, E74A seems to confer sharply opposing function 
on animal fitness at different life stages, being essential during pre- 
adult development"', but harmful to the adult during ageing (this 
study). This biological property—of a gene being beneficial at one 
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Figure 3 | The Drosophila Eip74EF gene is a target of miR-34 in modulation 
of the ageing process. a, E74A mRNA is robustly expressed in the adult and 
unchanged between age-matched controls and mir-34 mutants. In control flies, 
E74A mRNA is significantly upregulated in 30 day compared to 3 day animals. 
RNA was from male heads. Mean = s.e.m., n = 3 independent experiments; 
signal density of E74A mRNA normalized to 18S rRNA loading control 

(*P < 0.01, one-way analysis of variance, with post test: Tukey’s multiple 
comparison test). Genotypes: control, 5905; mir-34_‘~, mir-34 null-1 in 5905 
homogenous genetic background. b, E74A protein is deregulated in mir-34 
mutants. Protein was from male heads. Mean = s.e.m., n = 3 independent 
experiments; signal density of E74A protein normalized to tubulin loading 
control (*P < 0.01, one-way analysis of variance, with post test: Tukey’s 
multiple comparison). Genotypes as in a. c, Deregulation of E74A protein is 
diminished in mir-34 rescue flies. Protein was from male heads. Mean = s.e.m., 
n = 3 independent experiments; signal density normalized to tubulin loading 
control (*P < 0.05, one-way analysis of variance, with post test: Tukey’s 
multiple comparison test). Genotypes: control, 5905; mir-34 /~, mir-34 null-1 
in 5905 homogenous genetic background; mir-34—'~; mir-34( =. mir-34 
genomic rescue in mir-34 null-1 in 5905 homogenous genetic background. 

d, mir-34 mutants homozygous for the E74A®°"!8® allele have lower levels of 
E74A protein. Protein was from male heads of 20 day flies raised at 29 °C. 
Mean + s.e.m., n = 3 independent experiments; signal density normalized to 
tubulin loading control (*P < 0.01, one-way analysis of variance, with post test: 
Tukey’s multiple comparison test). Genotypes: control, 5905; mir-34 /~ 
E74A°°/4+, E74A2016/ 4. miR-34 null-1 in 5905 homogenous genetic 
background; mir-34/~ E74A°S/E74A°°, E74A% 0189/6744 O01, miR-34 
null-1 in 5905 homogenous genetic background. e, f, Reducing E74A protein 
levels in the adult mitigates age-related defects of mir-34 mutants. mir-34 
mutants also homozygous for E74A°°'%°° show rescued lifespan (e) and brain 
morphology (f), compared to mir-34 mutants heterozygous for E74A200!80° 
(these flies have a lifespan that is the same as mir-34 mutants alone; see 


life stage, but damaging at another—is referred to as antagonistic 
pleiotropy’®. Genes associated with antagonistic pleiotropy are likely 
to be evolutionarily retained due to their earlier beneficial function”. 
Their adult-onset activities, however, antagonize the ageing process if 
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Supplementary Table 4). Flies raised at 29 °C. Lifespan: P< 0.0001 (log-rank 
test). Mean = s.e.m., 1 = 150 male flies. Brain vacuoles: *P < 0.01 (one-way 
analysis of variance, with post test: Tukey’s multiple comparison test). 

Mean + s.e.m., m = 10 independent male animals. Genotypes as in 

d. g, Upregulation of mir-34 reduces accumulation of pathogenic polyQ protein 
inclusions. Left panels: in the retina of flies expressing SCA3trQ78 alone, 
pathogenic polyQ protein is initially diffuse (1 day, top), but gradually 
accumulates into nuclear inclusions (3 day, bottom). Right panels: upregulation 
of mir-34 reduces inclusion formation. DAPI staining highlights nuclei. 3 day 
controls show 53.75 + 12.55 inclusions in a retinal section versus 23.67 + 7.57 
with mir-34 upregulation; mean + s.d., n = 3 cryosections from independent 
male animals; P < 0.01 (t-test). Genotypes: SCA3trQ78 is w';rh1-GAL4, UAS- 
SCA3trQ78/+. SCA3trQ78; mir-34 (+) is w*; rh1-GAL4, SCA3trQ78/+; UAS- 
mir-34/+. Scale bar, 0.05 mm. h, Upregulation of mir-34 prevents neural 
degeneration. At 21 days, male flies expressing SCA3trQ78 show a marked loss 
of photoreceptor neuronal integrity (middle panel), with an average of only 
2.46 + 1.32 photoreceptors per ommatidium remaining by pseudopupil 
analysis. Flies with upregulated mir-34 (right panel) retain 6.90 + 0.34 
photoreceptors per ommatidium. Control (left panel) and upregulation of mir- 
34 alone (not shown) have normal photoreceptor numbers per ommatidium. 
Mean = s.d., n = 619, 722 and 700 ommatidia, for SCA3trQ78, SCA3trQ78; 
mir-34 (+) and control, respectively; ***P < 0.0001 (one-way analysis of 
variance, with Bonferroni’s multiple comparison test). Genotypes as in 

b; control: w*; rh1-GAL4/+. Scale bar, 0.05 mm. i, Flies with upregulated mir- 
34 (colour) have an extended median lifespan compared to control flies (black 
and grey curves for repeats 1 and 2, respectively) (log-rank test). Lifespan result 
for each genotype is indicated in median and maximal days. Mean + s.e.m., 
n= 150 male flies per genotype, 25 °C. Three independent mir-34 genomic 
transgenic lines (4, 8, 9) were analysed. Genotypes: control, 5905; mir-34 (+), 
mir-34 genomic rescue in 5905 homogenous genetic background. 


they are not properly regulated. miRNA pathways provide a 
tantalizing mechanism by which to suppress potentially deleterious 
age-related activities of such genes; a number of miRNAs have been 
noted to show age-modulated expression and activity’*’. Roles of 
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select miRNAs normally expressed in the adult may be of evolutionary 
advantage to tune-down events that promote age-associated decline 
and potentially disease, in order to prolong healthy lifespan and 
longevity. Upregulation of lin-4, a C. elegans miRNA with a known 
developmental role, extends nematode lifespan”, raising the possibility 
that this upregulation, like the natural increase of mir-34 expression in 
Drosophila, functions to silence genes that have a negative impact on 
ageing and potentially promote disease. Notably, mir-34 expression is 
elevated with age in C. elegans’””°, and mammalian mir-34 orthologues 
are highly expressed in the adult brain’’ and have also been noted to 
increase with age and be misregulated in degenerative disease in 
humans””®*. Current data regarding miR-34 function indicate that it 
is neutral or adverse in C. elegans’*”’, and can be either protective 
or contributory to age-associated events in vertebrates *°. Thus, 
miR-34 seems to be a key miRNA poised to integrate age-associated 
physiology; the precise function will reflect the diverse spatiotemporal 
expression and activity of distinct orthologues, the mRNA target 
spectrum, as well as the complexity of the adult brain and life cycle. 
The conservation of miR-34, coupled with in-depth comparative 
analysis of mir-34 expression, 3' end processing, targets and pathways 
in the ageing process of nematodes, flies and mammals, make it a 
tempting subject for understanding features of ageing and disease 
susceptibility. 


METHODS SUMMARY 


Flies were grown in standard media at 25 °C unless otherwise specified. Stock lines 
and GAL4 driver lines were obtained from the Drosophila Stock centre at 
Bloomington, or are described‘. Deletion of the mir-34 region was made by site- 
specific recombination. Fly transgenics were generated by standard procedures. 
Flies were generated or backcrossed a minimum of five generations into a controlled 
uniform homogeneous genetic background (line 5905 (FlyBase ID FBst0005905, 
w'7'8)), to assure that all phenotypes were robust and not associated with variation 
in genetic background. In this uniform homogeneous genetic background, the 
lifespan of control flies is highly uniform with repetition when 150 or more indi- 
viduals are used for lifespan analysis. Negative geotaxis and thermo stress were used 
to examine fly locomotion and stress resistance, respectively. Adult male heads were 
processed for paraffin sections as described'*. To determine lifespan, newly eclosed 
males were collected and maintained at 15 flies per vial, transferred to fresh vials 
every 2 days while scored for survival. A total of 150-200 flies were used per 
genotype per lifespan; all experiments were repeated multiple times (see 
Supplementary Table 4). Lifespans were analysed in Excel (Microsoft) and by 
Prism software (GraphPad) for survival curves and statistics. Techniques of 
molecular biology, western immunoblots and histology were standard. Fly brain 
mRNA was prepared using Trizol reagent for array and mRNA analysis, miRNA 
arrays were miRCURY LNA arrays version 8.1 (Exiqon), and mRNA expression 
was profiled using Affymetrix Drosophila 2.0 chips (Affymetrix). The microarray 
data can be found in the Gene Expression Omnibus (GEO) of NCBI through 
accession number GSE25009. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Genetic background. Fly lines were from the Bloomington Stock centre or are 
described'*. To control for background effects, and to assess significance of all 
effects, flies were generated in the same uniform homogeneous genetic background 
(line 5905 (FlyBase ID FBst0005905, w'!'8)). or backcrossed a minimum of five 
generations into this uniform genetic background. This assured that, for all 
phenotypes, even modest and consistent effects were associated with the gene 
manipulations and not a variation in background. With these carefully controlled 
experiments, the lifespan of control flies was highly uniform upon repetition, 
when 150 or more individuals were used for lifespan analysis (see Supplemen- 
tary Table 4). 

mir-34 deletion mutants. Deletion of the mir-34 region was made by site-specific 
recombination between two piggyBac insertions, using FLP-FRT-mediated site- 
specific recombination’*. The loss of other genes in the region was then fully 
rescued by genomic transgenes, so that a line selectively lacking only mir-34 was 
generated. Two FRT-bearing insertions, PBac[XP]d02752 and PBac[RB]Fmr1°°?””, 
were used (Exelixis collection), which encompass the mir-34 region. Genetic crosses 
were made to combine these two transposon elements with heat-inducible FLP 
recombinase. After 48h of egg laying, parents were removed, and vials containing 
progeny were placed in a 37 °C water bath for a 1-h heat shock. Progeny flies were 
treated with daily 1-h heat shock, for an additional 4 days. Young virgin female 
progeny flies were collected and crossed to males with 3rd chromosome balancers. 
In the subsequent generation, progeny males were used to generate additional 
progeny for PCR confirmation. Progeny flies bearing the deletion were positive 
for PCR verification, using primers from neighbouring genomic DNA and ones 
from transposons (upstream insertion: 5’-GGTCGTGCATGACGAGATTA-3'/ 
5’-TACTATTCCTTTCACTCGCACTTATTG-3’; downstream insertion: 5’-TC 
CAAGCGGCGACTGAGATG-3'/5'-GTGCGTTCGAAGAAATGATG-3’). Flies 
with the mir-34 region deletion were viable, and were further verified for the appro- 
priate deletion by PCR amplification, with primers for mir-317 (5'-CGGAAA 
AACGGTTITGTGTCT-3'/5'-CCCGGGAACGAGTAAACGAAATGAAAATCA-3’), 
mir-277 (5'- TGATTTATGGTTTTTGTTTCAGTTG-3'/5'-TTGATATCATT 
TCACACTATCACAAAAATTGC-3'), mir-34 (5'-ACCTTGAGCGCTTCAAC 
TCT-3'/5'-CACTCTTTCTCGTTTGCATGG-3’) and dfmr1 (5'-CACACAGA 
GCTTCCCAGTGA-3'/5’-AGGCCCTCCTTTTTGACATT-3’). 

Fly age-associated phenotypes. Negative geotaxis and thermo stress were used to 
examine fly locomotion and stress resistance, respectively. To perform negative 
geotaxis, groups of 15 adult male flies of indicated age were transferred into a 
14-ml polystyrene round-bottom tube (Falcon), and placed in the dark for 30-min 
recovery. The assay was conducted in the dark, with only a red light on. Climbing 
ability was scored as the percentage of flies failing to climb higher than 1.5 cm from 
the bottom of the tube, within 15 s after gently being banged to the bottom. Three 
repeats were performed for each group and the result averaged. For each genotype 
at a given age, a minimum of 200 flies were tested. For heat sensitivity, groups of 15 
adult males of indicated age were transferred into 14-ml polystyrene round- 
bottom tubes (Falcon) then placed in a 25°C incubator for 30 min recovery. 
Heat stress was applied by immersing the vial containing the flies into a 37°C 
water bath for 1h, followed by a 30-min recovery at 25 °C, then another 1-h heat 
stress at 37 °C. Flies were then transferred into regular food vials and maintained at 
25 °C. Dead flies were counted after 24h. To assess brain morphology, adult male 
heads were processed for paraffin sections as described”, and brain vacuoles were 
counted through continuous sections generated from each head (n = 10 heads 
counted for each genotype). 

Molecular biology. Fly genomic DNA was prepared from whole flies with the 
Puregene DNA purification kit (Qiagen). To generate mir-34 pUAST constructs, 
PCR amplification was conducted using genomic DNA as template, with primer 
pairs of pUAST mir-34-I (286bp, PCR primer 5'-CCGTTACACACGACT 
ATTCTCAAT-3'/5'-CCATCTGATACAGGTCCTACATTTTCTAAAA-3’) and 
pUAST mir-34-II (936 bp, PCR primer 5’-ACCTTGAGCGCTTCAACTCT-3'/ 
5'-CACTCTTTCTCGTTTGCATGG-3’). PCR products were then ligated into 
the pUAST vector. mir-277/dfmr1 rescue construct was made in the pCaSpeR4 
vector, which contained two parts. Part 1 was a genomic DNA fragment (7,530 bp) 
harbouring the mir-277 sequence (PCR primers: 5’-GGTCGTGCATGACGAG 
ATTA-3'/5'-GGATGTTTTGCGACCAACTT-3’), and part 2 was a genomic 
fragment containing dfmr1 genomic sequence, derived from the pBS WTR con- 
struct (a gift from T. Jongens*’), by BamH1 and Ppuml. The mir-34 genomic 
rescue construct was also made in the pCaSpeR4 vector, with two parts. One 
was a genomic DNA fragment (6,855 bp) upstream of mir-277 sequence (PCR 
primers: 5'-GGTCGTGCATGACGAGATTA-3'/5'-GGATGCATTTTATCGTT 
AGGC-3’), and the other was a genomic DNA fragment (2,111 bp) containing 
mir-34 sequence (PCR primers: 5’-GCAGGAAAATGCGATAAATGA-3'/ 
5’-TCGTTACAACATGGAAATCCTC-3’). The resultant construct, therefore, 
contains mir-34 sequence, including most upstream fragment, with the exclusion 


of 108 bp of mir-277 sequence. In addition, a modified mir-34 genomic rescue 
construct was made (pCaSpeR4 vector), which contains same upstream and down- 
stream ends of the original mir-34 genomic rescue construct, with a small deletion 
of miR-277 mature sequence. The genomic regulation of mir-34 seems complex, as 
despite these standard manipulations for gene rescue, the genomic rescue expres- 
sion of mir-34 and extent of phenotypic rescue of mir-34 mutants was only partial. 
We attempted upregulation of mir-34 with the GAL4-UAS system, including with 
the conditional gene switch system in adults. Upregulation of mir-34 during 
development in non-germline tissues (when it normally is not expressed; Sup- 
plementary Fig. 2a) was deleterious, and we were unable to upregulate mir-34 
expression more robustly than with the genomic constructs. 

For western immunoblots, 10 adult male heads per sample were homogenized 
in 50 pl of Laemmli buffer (Bio-Rad) supplemented with 5% 2-mercaptoethanol, 
heated to 95 °C for 5 min and 10 ul loaded onto 4-12% Bis-Tris gels (NuPage), 
then transferred to nitrocellulose membrane (Biorad) and blotted by standard 
protocols. Primary antibodies used were anti-tubulin (1:10,000, E7, Develop- 
mental Studies Hybridoma Bank), anti-E74A (a gift of C. Thummel). Secondary 
antibodies for immunoblots were goat anti-mouse conjugated to HRP (1:2,000, 
Chemicon) and developed by chemiluminescence (ECL, Amersham). The final 
image was obtained by Fuji scanner (Fujifilm). 

Total RNA was isolated from 50-200 male heads per genotype, by cutting off 
heads with a sharp razor, then putting heads into Trizol reagent. Heads were 
ground by pestle, then RNA was isolated following the manufacturer’s protocol 
(Trizol reagent, Invitrogen). 5 ug RNA was used per lane. Gel running (1% agarose) 
and blot transfer (nylon plus) were according to recommended procedures 
(Northernmax, Ambion). The RNA blot was then used for hybridization following 
standard procedures at 68 °C, with pre-hybridization (~1 h), hybridization (~12h 
or overnight) with P**-labelled probe, washed and exposed to Phosphoimager 
(Amersham). RNA probes were used that were made by in vitro transcription of 
cDNA templates using Maxiscript-T7 in vitro transcription kit (Ambion), supple- 
mented with P**-labelled UTP. The cDNA templates were prepared from total 
RNA by one-step RT-PCR (SuperScript One-Step RT-PCR with Platinum Taq, 
Invitrogen), with primers: E74A (5'-GTGAACGTGGTGGTGGAAC-3'/ 
5'-GATAATACGACTCACTATAGGGAGATGTCCATTCGCTTCTCAATG-3’); 
E74B_ (5'-CATCGCTTGTCAATGTGTCC-3'/5'-GATAATACGACTCACTA 
TAGGGAGACTGCGGTAATCACTGAGCTG-3’);18S rRNA loading control 
(5'-GATAATACGACTCACTATAGGGAGA-3'/5'-AGGGAGCCTGAGAAAC 
GGCTACCACATCTAAGGAATCTCCCTATAGTGAGTCGTATTATC-3’). 

For small RNA northern blots, total RNA was isolated from male fly heads using 
Trizol reagent as above. For each lane, 3 tg of RNA was used, and RNA was 
fractionated on a 15% Tris-UREA gel (NuPage) with 1x TBE buffer. The blot 
transfer was performed with 0.5xTBE buffer. Before hybridization, the RNA blots 
were pre-hybridized with Oligohyb (Ambion), and then incubated with radioactive 
labelled RNA probes for ~12h to overnight. RNA probes were used, and made by 
in vitro transcription of oligo templates using Maxiscript-T7 in vitro transcription 
kit (Ambion), supplemented with P*? -labelled UTP. Oligo DNA templates were 
prepared by annealing two single-stranded DNA oligonucleotides into duplex 
(99°C, 5min and cool down to room temperature). Oligonucleotides used for 
mir-34 (5'-GATAATACGACTCACTATAGGGAGA-3'/5'-AAAAAATGGCA 
GTGTGGTTAGCTGGTTGTGTCTCCCTATAGTGAGTCGTATTATC-3’), mir- 
277 (5'-GATAATACGACTCACTATAGGGAGA-3'/5'-TAAATGCACTATCTG 
GTACGACATAAATGCACTATCTGGTACGACA TCTCCCTATAGTGAGT 
CGTATTATC-3’) and 2S rRNA (5'-GATAATACGACTCACTATAGGGA 
GA-3'/5'-TGCTTGGACTACATATGGTTGAGGGTTGTATCTCCCTATAGT 
GAGTCGTATTATC-3’). 

Luciferase assays were performed using standard approaches’. Specifically, 
8 X 10* DL1 cells were plated and bathed in 30 ul of serum-free medium with 
60 ng of dsRNA in each well of a 96-well plate. The next day, 1.6 ng of pMT-Firefly, 
400 ng of pMT-mir-34 and 400ng of pMT-renilla E74A wild-type or mutant 
3’ UTR reporters were transfected by Effectene (Qiagen). Two days after transfec- 
tion, the expression of the reporters and mir-34 was induced by CuSO,. Twenty- 
four hours after induction, luminescence assays were performed by the Dual-Glo 
Luciferase Assay System (Promega). The mir-34 seed sequences in the 3’ UTR of 
E74A were mutated as noted in Supplementary Fig. 3, using the Quik change 
mutagenesis system (Stratagene). Primers to knockdown Agol are described’. 

The miRNA-target prediction algorithms TargetScan (v5.1)*! and PicTar (fly)? 
were used to determine miR-34 target mRNA candidates. 
miRNA microarray analysis. For miRNA array analysis, Iso31 flies (isogenized 
w'T!8) were used. Flies were killed by brief submersion in ethanol under CO, 
anaesthesia, followed by two PBS washes (Sigma). To control for circadian effects, 
all flies were processed between 11:00 and 13:00. Brains were removed manually 
and collected in an Eppendorf microcentrifuge tube stored on ice. For each 
miRNA microarray replicate, 200-300 brains were collected for each time point, 
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with a ~50/50 ratio of males and females. RNA was prepared using the miRvana 
RNA extraction system (Ambion) yielding ~2.5 ug per 100 brains. RNA was 
eluted into 80 pl of RNase free water (Fisher Scientific) and stored at —80°C. 
miRNA profiling was carried out at the Penn microarray core facility using 
miRCURY LNA arrays (Exiqon) and protocols. Exiqon’s Hy3/H5-labelling kit 
was used (Exiqon). RNA samples were labelled with Hy3 and hybridized together 
with a Hy5-labelled common reference standard. The common reference standard 
consisted of equal amounts of RNA from brains of 3 days, 30 days and 60 days flies. 
The miRNA microarray data were analysed at the Penn Bioinformatics Core. Raw 
data was imported into Gene Spring 1.0 (Agilent) and normalized using a global 
LOESS regression algorithm (locally weighted scatterplot smoothing). Relative 
expression levels were calculated as the log, normalized signal intensity difference 
between the Hy3 and Hy5 intensity. Present/absent flagging was analysed by 
Exiqon (Exiqon). Expression levels (fold changes) for the 30 day and 60 day time 
point were calculated relative to the 3 day time point. The data sets were exported 
into Spotfire DecisionSite 9.0 (Tibco) for visualization and filtering. 

mRNA microarray analysis. For ageing microarray analysis, fly stock Iso31 was 
used. For mir-34 mutant microarray analysis, mir-34 null line-1 in 5905 back- 
ground was used, with fly 5905 line, as control. To generate an ageing profile, flies 
were aged to 3 days, 30 days and 60 days, and 30-50 brains dissected per time 
point, per replicate, as above (50-50 males and females). For each time point, three 
replicates were conducted. For mir-34 mutant microarray analysis, time points 
were 3 days and 20 days, and for each time point, 20 brains from male flies of the 
appropriate genotype were used, with five replicates in total. Microarray hybrid- 
ization and reading was performed at the Penn Microarray Core Facility. For 
mRNA microarrays, total RNA was reverse transcribed to ss-cDNA, followed 
by two PCR cycles using the Ovation RNA amplification system V2 (Ovation). 
Quality control on both RNA and ss-cDNA was performed using an 2100 Agilent 
Bioanalyzer (Quantum Analytics). The cDNA was labelled using the FL-Ovation 
cDNA Biotin Module V2 (Ovation), hybridized to Affymetrix Drosophila 2.0 chips 
(Affymetrix) and scanned with an Axon Instruments 4000B Scanner using 
GenePix Pro 6.0 image acquisition software (Molecular Devices). Affymetrix 
.cel (probe intensity) files were exported from GeneChip Operating Software 
(Affymetrix). The .cel files were imported to ArrayAssist Lite (Agilent) in which 
GCRMA probe-set expression levels and Affymetrix absent/present/marginal 
flags were calculated. Statistical analysis for those genes passing the flag filter 
was performed using Partek Genomics Suite (Partek). The signal values were 
log, transformed and a 2-way ANOVA was performed. 


LETTER 


Transcriptional analysis of ageing status. We first used the wild type to extract 
age-associated probe sets and then compared the relative changes of these probe sets 
in a separate set of transcriptional profiles generated for the wild type and mir-34 
mutant. For transcriptional profiles of normal aged brains, the GCRMA package 
RMA (J. Z. Wu, J. MacDonald and J. Gentry, GCRMA: background adjustment 
using sequence information, R package version 2.14) for R/Bioconductor*’ was used 
to generate log, expression levels for probe-set IDs from the original .cel files. Then, 
a linear regression model was used to compute the significance of a correlation 
between age and gene expression’. This approach assumes a linear relationship 
between age and log, expression level: 


Yj = ut By Ajtai 


In this equation, Yj is the log, gene expression level of probe set i in sample j, A; is 
the age for individual j. The coefficients /,; is regression coefficients reflecting the 
rate of change in gene expression with respect to age. Probe sets with expression 
significantly correlated with age (P= 0.001 for /;) were determined. Then the 
same probe sets were used to estimate the relative expression in separate profiles of 
mir-34 mutants and age-matched controls. The average levels of each individual 
probe set were calculated for the difference between 20 day and 3 day, within the 
same genotype (that is, A20 day/A3 day) for each gene in controls and mir-34 
mutants, respectively. These differences were then compared between genotypes 
(that is, mir-34 mutants — controls). The significance of the difference between 
genotypes was analysed using a paired Wilcoxon test. The difference between 
control and mutant samples in positively correlated genes (Fig. 2d) is not by 
chance (P = 0.0001). 
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Neisseria are obligate human pathogens causing bacterial meningitis, septicaemia and gonorrhoea. Neisseria require iron 
for survival and can extract it directly from human transferrin for transport across the outer membrane. The transport 
system consists of TbpA, an integral outer membrane protein, and TbpB, a co-receptor attached to the cell surface; both 
proteins are potentially important vaccine and therapeutic targets. Two key questions driving Neisseria research are how 
human transferrin is specifically targeted, and how the bacteria liberate iron from transferrin at neutral pH. To address 
these questions, we solved crystal structures of the TbpA-transferrin complex and of the corresponding co-receptor 
TbpB. We characterized the TbpB-transferrin complex by small-angle X-ray scattering and the TbpA-TbpB-transferrin 
complex by electron microscopy. Our studies provide a rational basis for the specificity of TbpA for human transferrin, 
show how TbpA promotes iron release from transferrin, and elucidate how TbpB facilitates this process. 


Neisseria comprise a large family of Gram-negative bacteria that 
colonize humans. Two family members, Neisseria gonorrhoeae and 
Neisseria meningitidis, are pathogens that invade the urogenital tract 
and nasopharynx, respectively, causing gonorrhoea, meningitis and 
other systemic infections. Although vaccines exist for bacterial 
meningitis, they have significant limitations’ and are ineffective 
against serogroup B N. meningitidis. Currently, there are no vaccines 
to protect against gonococcal infections. The recent emergence of 
antibiotic-resistant strains” adds urgency to the need to develop more 
effective countermeasures for both pathogens. 

Neisseria require iron for survival and virulence’. Unlike most Gram- 
negative bacteria, Neisseria do not make siderophores but instead 
extract iron directly from serum transferrin in the human host (TF). 
The neisserial transport system consists of two large surface proteins: 
TF-binding protein A (TbpA), a 100-kDa integral outer membrane 
protein belonging to the family of TonB-dependent transporters’; 
and TF-binding protein B (TbpB), an ~80-kDa co-receptor attached 
to the outer membrane by a lipid anchor (Supplementary Fig. 1). Both 
proteins are found in all clinical isolates of pathogenic Neisseria. TobpA 
binds apo and iron-containing transferrin with similar affinities, 
whereas TbpB only associates with iron-bound TF**. Although TbpA 
can extract and import iron without TbpB, the process is considerably 
more efficient in the presence of the co-receptor’*. TbpA and TbpB 
induce bactericidal antibodies in mice against N. meningitidis”'° and 
N. gonorrhoeae’, making both proteins important vaccine targets. To 
elucidate how TbpA and TbpB function to bind human TF selectively 
and extract its tightly bound iron (K, = 10°° M_') at physiological pH, 
we combined an approach consisting of X-ray crystallography, 
small-angle X-ray scattering and electron microscopy to determine a 
model of the 260-kDa iron import complex from N. meningitidis 
strain K454 (serogroup B). Because N. gonorrhoeae strains FA1090 
and FA19 express TbpA proteins that are 94% identical to the 
meningococcal protein, whereas the corresponding TbpB proteins 


are 61% and 69% identical, respectively, our results are relevant to 
both pathogens. 


Crystal structure of the TbpA-(apo)hTF complex 


Structural characterization of the neisserial iron import machinery 
was initiated by crystallizing N. meningitidis TbpA with full-length, 
glycosylated apo human transferrin (hTF) and solving the structure to 
a resolution of 2.6 A (Fig. 1, Supplementary Figs 2-4, 17 and Sup- 
plementary Table 1). Despite being significantly larger (~20%) than 
other structurally characterized TonB-dependent transporters’, 
TbpA retains the classic fold with a 22-strand transmembrane -barrel 
encompassing a plug domain (Fig. 1a). Most of the additional mass is 
found in several extracellular loops which extend up to ~60 A above 
the outer membrane. A plug loop implicated in iron uptake’! is 
unusually long and protrudes ~25 A above the cell surface. 

Human TF is a bilobal glycoprotein (~80 kDa) with a single ferric 
(Fe**) iron tightly bound within a cleft in each lobe (Fig. 1a and 
Supplementary Fig. 8). Each lobe of TF consists of two subdomains 
which form the cleft: N1, N2, Cl and C2. In the absence of iron, each 
lobe adopts an open conformation (Protein Data Bank (PDB) acces- 
sion code 2HAV)"™. To obtain the best model of neisserial iron import, 
we solved the structure of diferric hTF at 2.1 A resolution (Fig. 2d and 
Supplementary Fig. 8). In our diferric structure, each lobe is found in a 
fully closed conformation, nearly identical to the diferric structures 
for both porcine (PDB code 1H76) and rabbit (PDB code 1JNF) TF. 

When TbpA binds hTF, it sequesters ~2,500 A? of buried surface, 
with 81 TbpA residues and 67 hTF residues participating in the inter- 
action (Fig. la, c, e and Supplementary Table 2). TbpA binds exclu- 
sively to the C lobe of hTF, where electrostatic complementarity exists 
between the extracellular surface of TbpA (electropositive) and the 
C1 subdomain of hTF (electronegative) (Fig. 1b, d). Two notable 
features of the interface include: (1) the unusually long TbpA plug 
loop (residues 121-139) interacts directly with the C1 subdomain 
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Outer membrane 


Figure 1 | Crystal structure of the TbpA-(apo)hTF complex. a, The TbpA 
B-barrel is lime, the plug is red, the helix finger is magenta. For hTF, the C lobe 
is gold and the N lobe is salmon. A ferric ion has been modelled into the C lobe 
asa red sphere. b, Electrostatic potential of TbpA viewed from the extracellular 
surface with hTF shown in gold ribbon. c, Residues of TbpA that bind hTF: 


(Figs la, 2a, c and Supplementary Fig. 5), and (2) an o-helix in TbpA 
extracellular loop 3 (residues 351-361, the L3 ‘helix finger’) is inserted 
directly into the cleft of the C lobe between the C1 and C2 subdomains 
(Figs 1a, 2a, b and Supplementary Fig. 5). The interaction between 
TbpA and hTF was found to be relatively tolerant to point mutations 
in TbpA, as might be expected given the large binding interface 
(Supplementary Fig. 6). 

The mechanism of species specificity of neisserial TbpA for hTF is 
unknown. In in vitro assays, gonococcal and meningococcal TbpA 
proteins have been shown to bind human TF, but not TF from cow, 
horse, rabbit, mouse, rat, sheep, duck or pig’*'’. In addition, mice 


Figure 2 | TbpA distorts the iron coordination site in the hTF C lobe by 
inserting a helix from extracellular loop three. a, TbpA (green) inserts a helix 
finger from extracellular loop 3 (magenta) into the cleft of the hTF C lobe 
(gold). b, The helix finger interacts with the hTF C-lobe residues through main- 
chain and side-chain interactions. c, The long TbpA plug loop (pink) interacts 
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TbpA 


gold, hydrophobic interactions; green, hydrogen bonds; red, salt bridges 
(residues labelled). d, Electrostatic potential of hTF viewed from the 
extracellular surface with TbpA shown in green ribbon. e, Surface 
representation of hTF showing regions that bind TbpA. 


infected with N. meningitidis displayed a higher mortality rate when 
the iron source was Fe,hTF rather than bovine TF”®. From the TbpA- 
hTF crystal structure, seven sites spanning both the Cl and C2 
subdomains of hTF participate in binding TbpA (Fig. le and Sup- 
plementary Fig. 7), with each site containing one or more residues 
unique to human TF. 

Because TbpA shows limited sequence variation (Supplementary 
Fig. 5 and Supplementary Table 4), is present in all clinical isolates, 
and nearly all the interactions with hTF are mediated by extracellular 
loops of TbpA, we attempted to disrupt the TbpA-hTF interface to see 
if this would be a viable therapeutic strategy. Peptides from four loops 


q Holo (closed) 
TbpA-bound 0° 
24° 
C lobe 
Apo (open) 
50° 
e 


with residues from the C1 subdomain. d, Comparison of C-lobe conformations 
for holo (green), apo (grey, PDB code 2HAU), and TbpA-bound TF (gold). 
e, Superposition of residues coordinating iron in diferric hTF (grey) with the 
same residues in hTF when bound to TbpA (gold). Distances for the residues 
coordinating iron are shown for the TbpA-bound state. 
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Figure 3 | SAXS analysis of the N. meningitidis TbpB-(holo)hTF complex. 
a, Superposition of TbpB from N. meningitidis with three TbpB structures from 
porcine pathogens. Whereas C lobes align closely, the N lobes show sequence 
and structural variability. Residues that diminish hTF binding when mutated 
are shown as spheres. Position 206 (proline in N. meningitidis) is critical for 
interaction with hTF. b, Competition ELISA showing relative binding affinities 
of TbpA and TbpB for apo-hTF, holo-hTF, hTF-Fey and hTF-Fec. Each 
experiment was performed in triplicate and data reported with standard errors. 
c, Fitting of the TbpB-(holo)hTF complex model into the averaged ab initio 
envelope was performed using Chimera. TbpB is shown in cyan, and hTF is 
shown in salmon (N lobe) and gold (C lobe). 


(loops 3, 7, 11 and the plug loop) that make substantial contacts with 
hTF were used as antigens for polyclonal antibody development 
(Supplementary Table 3). All four antibodies reduced hTF binding 
(Supplementary Fig. 9). These results show that although the TbpA- 
hTF interface is extensive, reagents can be designed to disrupt it. 


TbpA induces a conformational change in the hTF C lobe 
In the full-length apo-hTF structure™, both the N and C lobes are in 
‘open’ conformations, with 59.4° and 49.5° rotations required to align 
subdomains with diferric hTF (Fig. 2d and Supplementary Fig. 8). In 
the TbpA-hTF complex crystal structure, the N lobe is in the fully 
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Figure 4| Analysis of the TbpA-TbpB-(holo)hTF triple complex by 
negative stain electron microscopy. a, Model of the triple complex. TbpA is 
green, TbpB is light blue and hTF is gold. b, Purification of the triple complex by 
size-exclusion chromatography. The Coomassie-stained SDS gel shows 
purified TbpA in lane 1, the TbpB-(holo)hTF complex in lane 2, and the TbpA- 
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open conformation. Notably, interaction with TbpA causes the C lobe 
to adopt a conformation midway between open and closed, with a 24° 
rotation required to align Cl and C2 subdomains with diferric hTF 
(Fig. 2d and Supplementary Fig. 10). The TbpA L3 helix finger is 
inserted into the cleft, where it interacts with D634 from the Cl 
subdomain and several residues from the C2 subdomain (Fig. 2a, 
b). The long TbpA plug loop also interacts with the surface of the 
C1 subdomain of hTF (Fig. 2c). These interactions induce a partial 
opening of the cleft in the hTF C lobe, thereby destabilizing the iron 
coordination site to facilitate the release of iron from the C lobe to 
TbpA. Figure 2e shows the residues coordinating iron in the hTF 
C-lobe structure and the significant increase in these distances when 
hTF binds TbpA. Such increases are clearly incompatible with tight 
binding of iron in the C lobe. 


X-ray and SAXS structures of TbpB and TbpB- (holo) hTF 


Although only TbpA can acquire iron from hTF, the reaction is 
enhanced by expression of the co-receptor TbpB”*, which preferen- 
tially binds holo-hTF°. To understand how TbpB facilitates iron 
extraction and uptake, we solved the structure of N. meningitidis 
TbpB (Fig. 3a, Supplementary Figs 12, 18 and Supplementary Table 1). 
TbpB consists of N and C lobes that are structurally similar, sharing an 
eight-strand B-barrel subdomain flanked by a four-strand ‘handle’ 
domain. 

TbpB proteins from different isolates vary substantially in size and 
sequence (Supplementary Fig. 11), but the overall fold is conserved. 
Our structure aligns closely with three TbpB structures from porcine 
pathogens’””° and shows that sequence and conformational variations 
are found primarily in the N lobe (Fig. 3a and Supplementary Fig. 12b). 
Specifically, residues affecting TF binding lie on the distal surface of the 
N lobe’ and confer much of the TF species specificity (Fig. 3a). We 
made point mutants at four sites on this surface that reduced or 
abolished binding to hTF (Fig. 3a and Supplementary Fig. 12). An 
analysis of our mutants, and those reported previously, demonstrates 
that the major site of interaction lies in the N lobe. 

To clarify the interactions of purified TbpA and TbpB with hTF, an 
enzyme-linked immunosorbent assay (ELISA) was used to probe 
binding to Fe,hTF, monoferric hTF with iron only in the N lobe 
(Fey), or in the C lobe (Fec), and apo-hTF (incapable of binding iron 


200 A 


TbpB-(holo)hTF complex in lane 3. c, Row 1, set of seven non-redundant class 
averages of negatively stained complexes; row 2, examples of individual images 
assigned to the respective classes; row 3, re-projections of the model in the 
corresponding orientations, band-limited to 15 A resolution; row 4, surface 
renderings of the band-limited model in the corresponding orientations. 
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in either lobe)”. Consistent with earlier studies using apo- or holo- 
hTF®, TbpA binds all four hTFs with equal affinity regardless of the 
iron status of either lobe (Fig. 3b). In contrast, TbpB has a strong 
preference for hTF constructs with iron bound in the C lobe, regardless 
of the coordination state of the N lobe. These experiments clearly show 
that, at least in vitro, hTF interacts with TbpA and TbpB solely through 
the C lobe and is not affected by the presence or absence of iron in the N 
lobe. Our results indicate that Neisseria cannot use the entire serum TF 
iron supply and that the primary function of TbpB is to select and 
concentrate on the cell surface only those forms of TF that are able to 
provide iron to the bacterium. 

Because TbpB primarily binds the C lobe of hTF through its N lobe, 
we performed steered molecular docking for the TbpB-hTF complex 
based on previous docking studies for the porcine complex” and on 
our mutagenesis results. We collected small-angle X-ray scattering 
(SAXS) data on the TbpB-(holo)hTF complex (Supplementary Figs 12 
and 13) and used GASBOR™ to construct the SAXS envelope (Fig. 3c 
and Supplementary Fig. 13). The resulting molecular envelope 
describes the spatial arrangement of TbpB and hTF, and was used to 
fit the TbpB-hTF complex structure. Binding TbpB to hTF buries 
~1,300 A? of surface area and, notably, uses a region of the hTF C lobe 
distinct from the site where TbpA binds. 


Structure of the triple complex by single-particle EM 

On the basis of the TbpA-(apo)hTF crystal structure and the SAXS 
solution structure of the TbpB-(holo)hTF complex, we formed an 
in silico model for the TbpA-TbpB-(holo)hTF triple complex by 


Figure 5 | Mechanism for iron import. a, Binding surfaces of TbpA (green) 
and TbpB (cyan) mapped onto the hTF C lobe. b, Enclosed chamber formed by 
TbpA-TbpB-(holo)hTF (left, magenta sphere). A cutaway view (right) from 
inside the chamber illustrates the proximity of the iron (red). c, Model for iron 
release. Conserved K359 in the L3 helix finger is positioned to interact with 
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superposing the two complexes along the Cl subdomain of hTF 
(Fig. 4a). To test this model, we assembled the triple complex from 
its components (Fig. 4b) and visualized the resulting particles by 
negative staining electron microscopy (EM) (Fig. 4c and Supplemen- 
tary Fig. 14). A set of 4,240 particles was subjected to a reference-free 
classification to identify subsets of like images, representing molecules 
viewed in the same orientation; the members of each class were then 
averaged to suppress noise. Several of the class averages show a central 
density, ~45 A across, to which two small globular densities are 
appended at points about 120° apart around its periphery (for 
example, Fig. 4c (asterisk) and Supplementary Fig. 14). A plausible 
interpretation is that the central density corresponds to the B-barrel 
domain of TbpA and the two appended densities to TbpB and hTF 
(Supplementary Fig. 14), in agreement with our model for the triple 
complex. 


Iron extraction and import 

The X-ray, SAXS and EM structures support a consistent arrangement 
for the TbpA-TbpB-(holo)hTF complex. Although TbpA and TbpB 
each bind hTF tightly through the C lobe, they have unique, non- 
overlapping binding sites (Fig. 5a). A consequence of the assembly of 
the triple complex is the formation of an enclosed chamber (volume of 
~1,000 A) at the union of the three protein components, which sits 
directly above the plug domain of TbpA (Fig. 5b). This chamber may 
serve two important roles for iron acquisition by the bacteria: (1) 
prevent diffusion of iron released from hTF; and (2) guide the iron 
towards the B-barrel domain of TbpA for subsequent transport. 


TbpA 


2 3 


Restriction 


residues that regulate iron release in eukaryotic iron uptake. d, Import of iron 
through TbpA. 1, an electrostatic surface depicts cavities between the TbpA 
barrel and plug domain; 2, plug domain constrictions close the tunnel; 3, 
molecular dynamics simulations show removal of constrictions upon 
interaction with TonB. 
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A plausible mechanism for iron extraction from hTF is shown in 
Fig. 5c. Insertion of the TbpA L3 helix finger into the cleft between the 
C1 and C2 subdomains of hTF positions a conserved lysine (TbpA 
K359; Supplementary Fig. 4) near the hTF triad of charged residues 
(hTF K534, R632 and D634) that has been implicated in iron release 
from the C lobe”. TbpA K359 is perfectly situated to interact with 
D634, which would disrupt the charge neutralization it normally 
provides to the two basic triad residues K534 and R632. This potential 
charge repulsion between the hTF Cl and C2 subdomains could 
induce cleft opening (as observed in the TbpA-(apo)hTF crystal 
structures), resulting in distortion of the C-lobe iron-binding site 
and subsequent iron release. Notably, a recent study indicates that 
an hTF D634A mutant has a rate of iron release that is 80-fold faster at 
pH 5.6 than the control under the same conditions”. 

To investigate iron transport across the outer membrane, steered 
molecular dynamics was used to simulate interactions between TbpA 
and TonB (Supplementary Fig. 15). In the ground state structure, a 
large, highly negative transmembrane cavity is located between the 
barrel wall and the plug domain, but access is restricted on the extra- 
cellular side by residues 91-96 (restriction loop) and on the periplasmic 
side by residues 65-71 (helical gate) from the plug domain (Fig. 5d). 
When force (designed to mimic interaction with TonB) is applied to the 
plug domain, it sequentially unfolds beginning with removal of the 
helical gate followed by the restriction loop, producing an unobstructed 
pathway from the extracellular space to the periplasm (Supplementary 
Movie 1). This pathway is lined by the EIEYE motif of the plug 
domain”, which contains multiple oxygen donor groups that could 
transiently bind iron as it is transported through TbpA. 


Concluding remarks 


Humans and bacteria have each developed unique strategies to acquire 
iron from serum transferrin’””* (Supplementary Fig. 16). Our TbpA- 
TbpB-hTF X-ray, SAXS and EM structures indicate a mechanism for 
bacterial uptake of iron with the following characteristics: (1) a large 
TbpA-hTF binding interface with many human-TF-specific inter- 
actions; (2) iron removal from the hTF C lobe by insertion of a helical 
element from TbpA into the iron-binding cleft; and (3) iron transport 
across the outer membrane after TonB-dependent conformational 
changes in the TbpA plug domain. This system allows efficient extrac- 
tion of iron despite the extremely high affinity of hTF bound iron at 
neutral pH. The TbpB co-receptor, which is tethered to the cell surface 
by a long, unstructured polypeptide chain, is able to attract and pref- 
erentially bind hTF with iron in the C lobe, thereby increasing the 
efficiency of the system. Crucial to the mechanism, TbpA and TbpB 
associate with different regions of the hTF C lobe, creating an enclosed 
chamber above the plug domain to ensure that iron is efficiently 
sequestered and directionally transported through the TbpA barrel 
(Supplementary Movie 2). Finally, as TbpA and TbpB are surface- 
exposed, antigenic and required for neisserial infections”, our struc- 
tures provide the necessary information for structure-based vaccine 
and drug design’®. 


METHODS SUMMARY 


The TbpA-(apo)hTF complex was crystallized from TbpA expressed in 
Escherichia coli and apo-hTF purchased from Sigma-Aldrich. For the TbpA- 
(apo)hTF C-lobe structure, hTF incorporating a protease cleavage site between 
N and C lobes was expressed in BHK cells and purified as described*’. Full-length 
N-His-tagged hTFs including holo, authentic apo, and both monoferric forms 
were expressed in BHK cells and purified as described”*. The diferric hTF struc- 
ture was solved using protein purchased from Sigma-Aldrich. The hTF C-lobe 
structure was solved using hTF C lobe from a construct containing a TEV protease 
site between the N and C lobes, expressed in BHK cells, and purified*’. TbpB was 
expressed in E. coli. X-ray data were collected at GM/CA and SER-CAT beamlines 
of the Advanced Photon Source synchrotron. SAXS data was collected on beamline 
BL4-2 at the Stanford Synchrotron Radiation Lightsource. EM data were collected 
on a CM120-LaB6 electron microscope (FEI), operating at 120kV. Molecular 
dynamics simulations were performed using the program NAMD”. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cloning, expression and purification of TbpA. The N. meningitidis TbpA 
sequence from strain K454 (B15:P1.7,16) was subcloned into pET20b (Novagen) 
containing an N-terminal 10-His tag. TbpA mutants were created using site- 
directed mutagenesis using QuikChange (Stratagene). For structural studies, 
mutation of M889 to Tyr improved expression levels. TbpA was expressed in 
BL21(DE3) cells at 20 °C without induction in terrific broth (TB) and carbenicillin. 
Expression for the mutants followed the same protocol. 

For purification, cells were re-suspended in lysis buffer (50mM Tris-HCl, 

pH7.5, 200mM NaCl, 1mM MgCl, 101g ml~’ DNasel, 100 pg ml”? AEBSF) 
and lysed by two passages through an Emulsiflex C3 (Avestin) homogenizer at 
4°C. The lysate was centrifuged at 12,000g for 10 min to remove unlysed cells and 
the supernatant was incubated with 2% Triton X-100 for 30 min at room temper- 
ature. The mixture was centrifuged at 160,000g for 90 min at 4 °C. The membrane 
pellets were re-suspended in 50mM Tris-HCl, pH7.5, 200mM NaCl, 20mM 
imidazole and solubilized by constant stirring using 5% Elugent for 16h at 4°C. 
Solubilized membranes were centrifuged at 265,000g for 60 min at 4°C and the 
supernatant filtered and applied to a 15-ml Ni-NTA column. TbpA was eluted 
using 250 mM imidazole. Peak fractions were concentrated and applied to an 
S-300HR Sephacryl size exclusion column (GE Healthcare) using 20mM Tris- 
HCl, pH7.5, 200mM NaCl, 0.8% CgE, and 0.02% NaN3. Peak fractions were 
verified using SDS-PAGE and western blot analysis using an anti-His monoclonal 
antibody (Sigma). 
Cloning, expression and purification of TbpB. The TbpB sequence (starting at 
residue L22) from N. meningitidis K454 was codon optimized and synthesized by 
GenScript and subsequently subcloned into a pET28b vector (Novagen). TbpB 
was expressed in T7-Express cells (NEB) at 37°C with IPTG induction at an 
optical density at 600nm of 0.75-1.0 with continued expression for 4h. 
Mutants were expressed using the same protocol. 

For purification, cells were harvested and re-suspended in 5 ml PBS per gram of 
cell paste and supplemented with 10 1gml' AEBSF and 100 gml_' DNasel. 
Cells were lysed by French press and then centrifuged for 45 min at 38,400g. The 
supernatant was applied to a Ni-NTA column and washed with 10 column 
volumes of PBS. A final wash was performed with PBS containing 20mM 
imidazole before elution with PBS/250 mM imidazole. Eluted protein was then 
dialysed against PBS overnight at 4°C. For constructs where the His tag was 
removed, TEV-HIS protease was added, the sample was dialysed and then passed 
through a second Ni-NTA column to remove uncleaved protein and protease. 
Finally, samples were purified by size-exclusion chromatography in PBS/0.02% 
NaN3. 

Crystallization and data collection. For crystallization of the TbpA-hTF com- 
plex, apo-human transferrin (Sigma) was mixed with TbpA at a 2:1 ratio and 
incubated on ice for 1h. The complex was isolated using Sephacryl S300HR 
chromatography equilibrated with 20mM Tris-HCl, pH7.5, 200mM NaCl, 
10mM Na-citrate, 1mM EDTA, 0.8% CgE, (Anatrace) and 0.02% NaN3. 
Fractions corresponding to the TbpA-hTF complex were verified using SDS- 
PAGE, pooled and concentrated to 10 mg ml |. Heptane-1,2,3-triol was added to 
3% final concentration, incubated on ice for 30 min and then the protein sample 
was filtered before crystallization. Sparse matrix screening was performed using a 
TTP Labtech Mosquito crystallization robot using hanging-drop vapour dif- 
fusion and plates incubated at 21°C. The best crystals were grown in 24-well 
Linbro plates (Hampton Research) from 20% Peg3350 and 200 mM BaBrp. Data 
were collected at the SER-CAT beamline of the Advanced Photon Source of 
Argonne National Laboratory and data processed using HKL2000**. The space 
group was P2,2)2, with one mol per asymmetric unit (ASU) and final cell 
parameters a = 91.014, b = 129.362, c = 198.589, « = 90.00, 3 = 90.00, y = 90.00. 

For crystallization of the TopA-hTF C-lobe complex, hTF C lobe*' was mixed 
in a 2:1 ratio with TbpA and the complex isolated by Sephacryl S300HR chro- 
matography using 20mM Tris-HCl, pH7.5, 200mM NaCl, 0.1% LDAO and 
0.02% NaN3. Final crystal conditions consisted of 21% PEG 1000, 100mM 
sodium acetate buffer (pH 4.8), 200 mM NaCl, 0.1% LDAO and 3% heptane- 
1,2,3-triol. Data were collected and processed as described for the TbpA-hTF 
complex. The space group was P21 with one mol per ASU with final cell parameters 
a = 58.055, b = 107.592, c = 130.721, x = 90.00, f = 94.48, y = 90.00. 

TbpB was crystallized from a 10 mg ml’ solution with 2.0 M NaCl and 2.0M 
ammonium sulphate. Data were collected at the GM/CA CAT beamline of the 
Advanced Photon Source of Argonne National Laboratory and data were pro- 
cessed using HKL2000**. The space group was P21 with two molecules per ASU 
with final cell parameters a= 75.288, b= 82.761, c= 111.882, «= 90.00, 
B = 105.95, y = 90.00. 

For diferric hTF crystallization, 100 mg of holo-hTF (Sigma) was solubilized and 
further purified by Sephacryl S300HR chromatography using 20 mM Tris-HCl, 
pH7.5 and 200mM NaCl. The protein was then concentrated to ~50 mg ml! 
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and crystallized using 100 mM HEPES, pH7.5, 1.6 M ammonium sulphate, and 
2% PEG 1000, with red-tinted crystals appearing only after several months and 
being extremely sensitive to even slight temperature changes. Drops containing the 
crystals were quickly hydrated with 3.4M ammonium sulphate immediately 
before being flash cooled in liquid nitrogen and stored for data collection. Data 
were collected at the SER-CAT beamline of the Advanced Photon Source of 
Argonne National Laboratory and data were processed using HKL2000*. The 
space group is C2 with six molecules per ASU and final cell parameters 
a = 254.53, b = 173.00, c = 150.15, « = 90.00, 6 = 123.26, y = 90.00. 

For hTF C-lobe crystallization, (holo)C lobe** was mixed with excess TbpB N 

lobe and the complex isolated by size-exclusion chromatography as above in 
25mM Tris pH 8.0, 200 mM NaCl. The complex was concentrated to ~10 mg 
ml! and broad screening performed using a Mosquito crystallization robot. 
Several crystallization conditions were observed; however, none was red in colour 
as might be expected for iron-bound crystals and most contained citrate, which is 
a known iron chelator. Data were collected and analysed as for TbpB. The space 
group was [422 with 1 mol of hTF C lobe per ASU with final cell parameters 
a = 95.847, b = 95.847, c = 204.140, « = 90.00, 3 = 90.00, y = 90.00. No TbpB N 
lobe was present in the crystals. 
Structure determination. For TbpA-hTF, we were unable to collect useful heavy 
atom derivatives for experimental phasing, and selenomethionine-substituted 
TbpA protein yields were not sufficient for crystallization. We eventually used 
molecular replacement in PHASER-CCP4*° to solve the TbpA-hTF complex 
structure. Here, we first searched for each of the two domains (N lobe and C 
lobe) of hTF using the deposited coordinates (PDB code 2HAV), which produced 
good solutions with Z-scores above 8. However, although the electron density for 
the hTF molecule was reasonable, the electron density for TbpA was poor and 
could not be used for model building. Our attempts at using known TonB- 
dependent transporter structures as search models (barrel and plug, together 
and individually) were unsuccessful (low Z-scores and LLG scores). We then 
aligned the TbpA sequence to our structure-based sequence alignment reported 
in our recent review* and found that TbpA contained many conserved regions 
characteristic of TonB-dependent transporters. Using the alignment between 
TbpA and its closest relative, FhuA (10% identity, ClustalW), and trimming 
the extracellular loops, 500 models within a root mean squared deviation of 
5 A were produced using Modeller (Accelrys). Each of these models was then 
used for molecular replacement within PHASER-CCP4*, with two of them 
producing Z-scores above 8. The solution with the highest LLG (containing both 
hTF and the TbpA model) was refined in PHENIX”** producing an initial R/Rfree 
value of 0.43/0.48. Further model building was performed using COOT” and 
subsequent refinement done in PHENIX** and BUSTER-TNT™. During the final 
states of refinement, extra density was observed which was mapped to residues 
N413 and N611 of hTF, both of which are reported as possible N-linked 
glycosylation sites. Therefore, N-linked glycans were built for these two residues. 
The final structure was solved to 2.60 A with R/Rgee values of 0.22/0.28. The 
TbpA-hTF C-lobe structure was solved by molecular replacement using the 
coordinates from the TbpA-hTF (full-length) structure reported here. Two 
search models were formed, one for TbpA and one for hTF C lobe. PHASER- 
CCP4* was used for molecular replacement and subsequent refinement per- 
formed using PHENIX”*. The structure was solved to 3.1 A resolution with final 
R/Réree Values of 0.24/0.29. 

The TbpB structure was solved by molecular replacement using PDB code 
3HOL. An initial model was created using the Swiss Model server” that was 
subsequently divided into four different search domains. PHASER-CCP4* was 
used for molecular replacement and subsequent refinement performed using 
PHENIX**. The structure was solved to 2.40 A resolution with final R/Réree Values 
of 0.25/0.30. 

The diferric hTF crystal structure was solved by molecular replacement using 
Phaser-CCP4*°. Search models for the N lobe and C lobe were created separately 
with the program Chainsaw (CCP4) using the existing diferric porcine TF coor- 
dinates (PDB code 1H76). Six copies of each lobe (six molecules of hTF total) were 
found in the ASU and the iron sites were easily observed in the difference density. 
These iron sites were further verified in an anomalous difference electron density 
map. Refinement was performed using PHENIX” and the structure was solved to 
2.1 A resolution with final R/Réree values of 0.19/0.23. 

The non-glycosylated hTF-C-lobe structure was solved by molecular replace- 
ment using PDB code 2HAU. An initial search model was formed by truncating 
the N-lobe domain. PHASER-CCP4* was used for molecular replacement and 
subsequent refinement performed using PHENIX”. The structure was solved to 
1.7A resolution with final R/Rg.e. values of 0.17/0.19. For all structures, figures 
were made with PyMOL (Schrodinger) or Chimera“ and annotated and finalized 
with Adobe Illustrator. 
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Dot blots. Whole cells (2 ul, 0.01 g ml!) and cell lysates (unmodified for TbpB 
samples, or incubated for 3 h with 2mM EDTA and 1% DDM at room temper- 
ature for TbpA samples) were spotted onto nitrocellulose membrane and allowed 
to dry at room temperature. The membranes were then blocked with PBST 2% 
BSA for 15 min, washed and probed with HRP-conjugated hTF (1:1,000) 
(Jackson ImmunoResearch) for 15min. The membrane was then washed and 
imaged using the colorimetric substrate 3,3’-diaminobenzidine (Sigma) where 
the appearance of a red dot indicated specific binding of the hTF-HRP conjugate. 
The results from the mutants were compared to wild-type Tbp to determine their 
effect on hTF binding. 

ELISA. Whole cells (100 ul at 10mg ml! or 1mgml * in PBS) of wild-type 
TbpA, empty vector control (pET20b) and TbpA mutants were added to a 
NUNC polystyrene 96-well plate (Fisher Scientific) and incubated at 37°C 
overnight. Wells were washed 2X with PBST and then blocked with PBST 2% 
BSA for 30min and probed with hTF-HRP (1:1,000) for 15min. Wells were 
washed 2X in PBST, 2X in PBS, and then developed using 100 pl 3,3’,5,5'- 
tetramethylbenzidine substrate (TMB, Sigma) for 5 min and terminated using 
Stop solution (Sigma). Absorbances of each well were determined using a BioRad 
iMark plate reader at 450 nm and data normalized and compared to wild-type 
TbpA. Each experiment was performed in triplicate and data reported with 
standard errors. 

Antibody blocking assays. Using the TbpA-hTF crystal structure reported here, 
we designed four different peptides based on four loops from TbpA (loops 3, 7, 11 
and plug loop) to be used as antigens for polyclonal antibody development 
(Precision Antibody). A fifth polyclonal antibody was developed using purified 
full-length TbpA (1X PBS 7.4, 0.1% DDM). An ELISA was designed to probe 
whether or not these antibodies could block hTF binding. Here, TbpA-His 
(20 ng) was incubated for 20 min in a final volume of 100 pil either alone or in 
the presence of each antibody (1:20) individually in PBS containing 0.05% Cymal-6 
(Anatrace). In addition, we tested the antibodies that targeted TbpA loops in 
combinations to determine if an additive affect could be observed. Each sample 
was then transferred to a 96-well Ni-NTA Agarose HiSorb plate (Qiagen) and 
incubated for 30 min and washed 2X with PBST + 0.05% Cymal-6. Assays were 
performed as described in the previous section. In a second set of ELISAs, TbpA- 
His was first bound to the Ni-NTA Agarose HiSorb plate before incubation with 
antibodies. Results were analysed and initial graphs made using Microsoft Excel. 
The graphs were then imported, annotated and finalized with Adobe Illustrator. 
Protease accessibility of TbpA and TbpA mutants. To confirm that TbpA and 
the TbpA mutants were being properly presented at the surface of the bacteria, we 
treated whole cells with trypsin (5 ug ml! final concentration) for 15 min at 
room temperature and the reaction was stopped by the addition of AEBSF (0.2 mg 
ml! final concentration). The cells were then centrifuged and supernatant 
removed. The pellets were then re-suspended in LDS loading buffer, boiled for 
10 min, centrifuged for 10 min, and then separated on a NuPAGE Novex 4-12% 
Bis-Tris gel. The samples were then transferred to a PVDF membrane using an 
iBlot system (Invitrogen) and western blot analysis performed using a polyclonal 
anti-TbpA antibody (1:1,000) and a monoclonal anti-His HRP-conjugated 
antibody (Sigma) (1:5,000). Here, each membrane was blocked with 2% BSA in 
1X PBST for 15 min and then probed with either the anti-TbpA or anti-His HRP 
conjugated antibody for 30 min. The anti-TbpA membrane was then washed 2 
with 2% BSA in 1X PBST and then probed with anti-mouse HRP-conjugated 
secondary antibodies for an additional 30 min. Both membranes were then 
washed 2X with 1X PBST, 2x with 1x PBS, and then imaged using the 
colorimetric substrate 3,3’-diaminobenzidine (Sigma). The results from the 
mutants were compared to wild-type TbpA to determine which constructs were 
being presented on the surface of the cells. 

Transferrin competition assays. To determine if the affinity of hTF to either 
TbpA or TbpB is affected by the conformation or coordination state of the N lobe, 
we performed an ELISA competition assay using apo-hTF, holo-hTF, hTF—Fey 
(iron bound in N lobe only) and hTF-Fe¢ (iron bound in C lobe only), which were 
expressed in BHK cells and purified as described”. Here, His-TbpA (20 ng) or 
His-TbpB (20 ng) was incubated for 15 min in a final volume of 100 il in a 96- 
well Ni-NTA HiSorb plate (Qiagen) in 1 PBS (0.05% cymal-6 was added to all 
buffers for HIS-TbpA). Wells were washed 2 with PBST, blocked with 2% BSA 
in PBST for 15 min, and then incubated with each of the transferrin constructs 
(apo, holo, Fey, Fec) (100 ng each) for 15 min. Wells were washed again 2X with 
PBST and then probed for 20 min with HRP-conjugated hTF (1:1,000) (Jackson 
ImmunoResearch) in 100 ul final volume. Wells were washed again 2x with 
PBST and then 2X with PBS and imaged using 100 111 3,3’,5,5’-tetramethylbenzidine 
substrate (Sigma) for ~5 min and terminated using Stop solution (Sigma). Data 
were collected and analysed as described above. 

Small-angle X-ray scattering analysis. The TbpB-hTF complex was dialysed 
overnight at 4°C into TBS, pH 7.4 (25 mM Tris, 137 mM NaCl, 3mM KCl) and 


then filtered using a 0.2 um spin filter. Data were collected at concentrations of 1, 
2.5and5 mg ml’ at Stanford Synchrotron Radiation Lightsource beamline BL4- 
2. Data reduction and analysis were performed using the beamline software 
SAStool. The program AutoGNOM of the ATSAS suite*' was used to generate 
P(r) curves and to the determine maximum dimension (D,,a,) and radius of 
gyration (Rg) from the scattering intensity curve (I(q) versus q) in an automatic, 
unbiased manner, and rounds of manual fitting in GNOM” were used to verify 
these values. Ab initio molecular envelopes were computed by the programs 
GASBOR™. Ten iterations of GASBOR were averaged using DAMAVER®. 
Docking of the TbpB and diferric hTF crystal structures into the molecular 
envelope was performed manually, guided by both previous docking studies” 
and mutagenesis results. Figures were made with PyMOL and annotated and 
finalized with Adobe Illustrator. 

Modelling the TppA-TbpB-hTF triple complex. The in silico TbpA-TbpB- 
hTF triple complex was assembled based on our crystal structures (TbpA- 
(apo)hTF, diferric hTF, TbpB) and SAXS analysis (TbpB-(holo)hTF) reported 
here. The crystal structure (TbpA-hTF) was aligned with our TbpB-hTF model 
using the Cl subdomain of hTF as a reference, yielding a triple complex contain- 
ing a 1:1:1 ratio of TbpA, TbpB and hTF. Figures were made with PyMOL and/or 
Maya (Autodesk) and annotated and finalized with Adobe Illustrator. 

Electron microscopic analysis. The triple complex (TbpA-TbpB-(holo)hTF) 
was prepared from separately purified components by first forming a complex 
between TbpB and (holo)hTF, which was purified by size-exclusion chromato- 
graphy in 1X PBS. Cymal-6 was added to a final concentration of 0.05% and 
purified TbpA (in 1X PBS, 0.05% Cymal 6) was added to the mixture using an 
excess of the TbpB-(holo)hTF complex. The triple complex (which retains a 1:1:1 
stoichiometry) was isolated by sized exclusion chromatography in buffer A 
(1X PBS in 0.05% Cymal 6) and used immediately for EM experiments. The 
complex was diluted with buffer A to an optimal concentration for EM (deter- 
mined empirically to be ~1 j1g ml '). Drops (4 pil each) were applied to carbon- 
coated, glow-discharged EM grids (EMS). After 1 min, the grid was blotted, 
washed twice with buffer A, once with distilled water, and then stained with 
2% uranyl acetate. Grids were observed with a CM120 La-B6 electron microscope 
(FEI), operating at 120 kV. Micrographs were recorded on $O163 film (Kodak) at 
a nominal magnification of 45,000, and digitized on a Nikon Coolscan 9000 at a 
rate corresponding to 1.55 A per pixel. The large majority of particles distributed 
evenly on the grid and were essentially uniform in size (~90-110 A in diameter), 
indicative of a homogeneous population. 

The particles were variable in substructure, suggesting that the molecules 
deposit on the grid in a variety of orientations. Accordingly, a data set of 4,240 
particles was subjected to a ‘reference free’ classification, using SPIDER”, 
EMAN* and Bsoft*’. Images were picked using a 256 X 256 pixel box, and binned 
four times (to 6.2 A per pixel) to increase the signal-to-noise ratio and the speed of 
calculation. Initial reference-free classification and averaging were performed 
using EMAN; further classification was done in SPIDER, using principal com- 
ponent analysis (PCA) with three cycles of iteration. We chose to obtain 56 final 
class averages, based on a cluster distribution obtained from PCA. 

The coordinates of the modelled triple complex were converted to a density map 
and low-pass filtered to 15 A. The sampling rate of the density map was set to be same 
as the EM images and two-dimensional projections were calculated at angular incre- 
ments of 30° (Supplementary Fig. 14b). Comparisons and matching between the EM 
class averages and the model re-projections were done in terms of cross-correlation 
coefficients. A few ambiguous cases were reassigned by visual assessment. Figures 
were made with Chimera and annotated and finalized with Adobe Illustrator. 
Molecular dynamics simulations. For simulations of TbpA bound to apo-hTF, a 
membrane-water system containing the protein complex was first built using 
VMD”. The complex was placed in a DMPE bilayer as used previously’, with 
the barrel of TbpA aligned with the membrane’s hydrophobic core, and then fully 
solvated. Disulphide bonds for three pairs in TbpA and 19 pairs in hTF were 
added based on S-S proximity. Ca”* and Cl” ions were added to a concentration 
of 100 mM, resulting in an initial size of 264,000 atoms. The system was equili- 
brated in stages for 13.5 ns, including 10 ns of fully unrestrained dynamics. The 
simulations were run using NAMD 2.7” in the NPT ensemble at a temperature of 
310K and a pressure of 1 atm; after the first 3.5 ns of equilibration, the area of the 
membrane was fixed. Other simulation parameters were set identically to those 
used previously**. For steered molecular dynamics (SMD) simulations, the Ca 
atom of the TbpA N-terminal plug domain residue was pulled in the —z direction, 
away from the membrane, at a constant velocity of 5Ans ! (refs 33, 49). To 
counterbalance the pulling forces, six residues at the extracellular periphery of the 
barrel domain were restrained in the z direction. An adaptive procedure was used 
to limit the maximum required system size during SMD simulations*®. When the 
extension of the unfolded region of the plug domain brought it near to the 
periodic boundary, the simulation was stopped, the unfolded region of the plug 
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domain distant from the barrel and membrane was cleaved, and the simulation 
restarted after a short equilibration of the water with the new N-terminal residue 
being pulled. With this procedure, used three times, approximately 150 A of 
pulling was accomplished while keeping the system sizes below 300,000 atoms. 
Sequence analysis and alignments. Sequence analysis and alignments were per- 
formed and analysed with the programs STRAP” and JalView™. Figures were 
annotated and finalized with Adobe Illustrator. 
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Medulloblastoma, the most common malignant paediatric brain 
tumour, arises in the cerebellum and disseminates through the cere- 
brospinal fluid in the leptomeningeal space to coat the brain and 
spinal cord’. Dissemination, a marker of poor prognosis, is found 
in up to 40% of children at diagnosis and in most children at the time 
of recurrence. Affected children therefore are treated with radiation 
to the entire developing brain and spinal cord, followed by high-dose 
chemotherapy, with the ensuing deleterious effects on the developing 
nervous system’. The mechanisms of dissemination through 
the cerebrospinal fluid are poorly studied, and medulloblastoma 
metastases have been assumed to be biologically similar to the 
primary tumour**. Here we show that in both mouse and human 
medulloblastoma, the metastases from an individual are extremely 
similar to each other but are divergent from the matched primary 
tumour. Clonal genetic events in the metastases can be demon- 
strated in a restricted subclone of the primary tumour, suggesting 
that only rare cells within the primary tumour have the ability to 
metastasize. Failure to account for the bicompartmental nature of 
metastatic medulloblastoma could be a major barrier to the 
development of effective targeted therapies. 

Thirty percent of patched-1-heterozygous (Ptch*'~) mice develop 
non-disseminated medulloblastoma by 8 months of age’. Recently, the 
Sleeping Beauty (SB) transposon system was shown to be an effective 
tool for functional genomics studies of solid tumour initiation and 
progression®’. We expressed the SB11 transposase in cerebellar 
progenitor cells in transgenic mice under the Math1 (also known as 
Atoh1) enhancer/promoter, but we did not observe any tumours when 
these mice were bred with mice transgenic for a concatemer of the T2/ 
Onc transposon* (Fig. la—j and Supplementary Figs 1 and 2). However, 
on a Ptch*'~ background, these Math1-SB11/T2Onc mice showed 
increased penetrance of medulloblastoma (~97%; 271 of 279 mice) 
compared with controls (~39%; 54 of 139 mice), as well as decreased 
latency (2.5 months compared with 8 months) (Fig. 1 and Supplemen- 
tary Fig. 2). Although Ptch*/~ medulloblastomas are usually localized, 
the addition of SB transposition results in metastatic dissemination 
through the cerebrospinal fluid pathways, identical to the pattern that 


is seen in human children (Fisher’s exact test, P= 1.8 X 10”, odds 
ratio = 5.2; Supplementary Table 1) (Fig. 1c, dand Supplementary Fig. 2). 
As neither transposon nor transposase alone had an effect on tumour 
incidence, latency or dissemination, we conclude that SB-induced 
insertional mutagenesis drives medulloblastoma progression on the 
Ptch*’~ background (Fig. 1i and Supplementary Fig. 2). 

Humans with germline mutations in the tumour-suppressor gene 
TP53 have Li-Fraumeni syndrome and have an increased risk of 
developing medulloblastoma. Although no medulloblastomas were 
found in mice with mutant Tp53 (also known as Trp53) (denoted 
Tp53™" mice, which includes Tp53*/~ and Tp53 ‘~), 40% of 
Tp53™/Math1-SB11/T2Onc mice developed disseminated medullo- 
blastoma’ (Fig. le-h, j and Supplementary Fig. 2). Human medullo- 
blastomas with TP53 mutations frequently have large cell/anaplastic 
histology. Tp53™"/Math1-SB11/T2Onc medulloblastomas have large 
cells, nuclear atypia and nuclear moulding that is typical of large cell/ 
anaplastic histology (Fig. 1f). We conclude that SB transposition can 
drive the initiation and progression of metastatic medulloblastoma on 
a Tp53™ background. 

We used linker-mediated PCR and 454 sequencing to identify the 
site of T2/Onc insertions in Ptch*'~/Math1-SB11/T2Onc and 
Tp53™™/Math1-SB11/T2Onc primary medulloblastomas and their 
matched metastases. Genes that contained insertions statistically more 
frequently than the background rate were identified as gene-centric 
common insertion sites (gCISs)'®. We identified 359 gCISs in 139 
primary tumours on the Ptch*’~ background and 26 gCISs in 36 
primary medulloblastomas on the Tp53™ background (Supplemen- 
tary Tables 2-7 and Supplementary Figs 3-5). A large number of gCISs 
were candidate medulloblastoma oncogenes or tumour-suppressor 
genes'’ (Supplementary Table 8). Insertions in candidate tumour- 
suppressor genes, including Ehmt1, Crebbp and Mxil, are predicted 
to cause a loss of function (Fig. 1k-m), whereas insertions in putative 
medulloblastoma oncogenes are largely gain of function, as exemplified 
by Myst3 (Fig. 1n). 

Many gClISs mapped to regions of amplification, focal hemizygous 
deletion and homozygous deletion (Supplementary Table 8) that we 
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recently reported in the genome of a large cohort of human medullo- 
blastomas’’. There is a high level of overlap between gCISs and known 
cancer genes (in the Catalogue of Somatic Mutations in Cancer 
(COSMIC) database) (Supplementary Tables 9 and 10), suggesting 
that many gCISs are bona fide driver genes in medulloblastoma 
(Fisher’s exact test, P= 0.0012)’. Similarly, many of the mouse 
gClSs and the genes amplified in human medulloblastomas are over- 
expressed in human SHH-driven medulloblastomas (Supplementary 
Fig. 6). Conversely, mouse gCISs hemizygously deleted in human 
medulloblastomas were frequently expressed at a lower level in human 
medulloblastomas (Supplementary Fig. 6). The expression of six out of 
seven gCISs that had been studied by immunohistochemistry on a 
human medulloblastoma tissue microarray was associated with sig- 
nificantly worse overall and progression-free survival in human 
medulloblastoma’* (Supplementary Table 11 and Supplementary 
Figs 7 and 8). We conclude that our SB-driven leptomeningeal- 
disseminated medulloblastoma model resembles the human disease 
anatomically, pathologically and genetically and thus is an accurate 
model of the human disease that can be used to identify candidate driver 
events and understand the pathogenesis of human medulloblastoma. 

We compared the gCISs identified from Ptch*’~/Math1-SB11/ 
T2Onc and Tp53™"'/Math1-SB11/T2Onc primary medulloblastomas 
and matched metastases (Supplementary Table 2). Strikingly, the over- 
lap between primary tumour gCISs (pri-gCISs) from Ptch *’~ /Math1- 
SB11/T2Onc tumours and those from metastases (met-gCISs) from 
the same animals was only 9.3% of all gCISs (Fig. 2a). Similarly, the 
overlap between pri-gCISs from Tp53™/Math1-SB11/T2Onc mice 
and the matching met-gCISs was only 8.9% (Fig. 2b). The leptomeningeal 
metastases and the matched primary tumour have identical, highly 
clonal insertion sites on both genetic backgrounds (Fig. 2c). The 
probability of two (or three) unrelated tumours having SB insertions 
in exactly the same TA dinucleotide is extremely low. We conclude that 
the leptomeningeal metastases and the matched primary tumour arise 
from a common transformed progenitor cell and have subsequently 
undergone genetic divergence. 

Sequencing also identified insertions that are highly clonal in the 
metastases but are not observed in the matched primary tumour (data 
not shown). End-point PCR for these insertions in the matched primary 
and metastatic tumours shows that the insertion is highly clonal in the 
metastasis (or metastases) and is present in a very small subclone of the 
primary tumour (Fig. 2d and Supplementary Fig. 9). These data are 
consistent with a model in which metastatic disease arises from a minor 
restricted subclone of the primary tumour. Dissemination could occur 
repeatedly from the same subclone of the primary tumour, which seeds 
the rest of the central nervous system, or it could occur once, followed 
by reseeding of the rest of the leptomeningeal space by the initial 
metastasis. Insertions that are restricted to a minor subclone of the 


Figure 1 | Transposon mutagenesis models of disseminated human 
medulloblastoma. a-d, The histology of transposon-driven medulloblastoma 
on the Ptch*/~ background resembles human medulloblastoma, with 
leptomeningeal metastases on the surface of the brain (c) and spinal cord 

(d). Images show haematoxylin and eosin staining (a, entire brain; b, upper 
spinal cord). e-h, The histology of transposon-driven medulloblastoma on the 
Tp53™ background shows histological features of large cell/anaplastic 
medulloblastoma, including nuclear pleomorphism and nuclear wrapping 

(f). Dissemination to the leptomeningeal spaces of the brain (g) and spinal cord 
(h) also occurs on this background. i, Ptch*/~ mice with SB transposition 
develop more frequent medulloblastomas with a shorter latency than Ptch*!~ 
mice without transposition. P values are from t-tests of survival comparing 
individual genotypes to Ptch*/~ mice; n, number of mice per genotype. mo., 
months. j, Medulloblastoma (MB) was not observed in Tp53™"* mice without 
transposition but was observed in 42% of Tp53™™' mice with transposition. P 
values are from f-tests comparing survival between Tp53™™' mice and Tp53™'/ 
SB11/T2Onc mice with MB; n, number of mice. kn, Insertion maps of notable 
gCISs. Insertions in the direction of transcription are denoted by green arrows, 
and those against the direction of transcription are denoted by red arrows. 
Transcription start sites are denoted by black arrows. 
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Figure 2 | Transposon-driven metastatic medulloblastoma genetically 
differs from the primary tumour. a, b, Venn diagrams depicting the degree of 
overlap and discordance in the gCISs in primary tumours and metastases, on 
the Ptch*/~ and Tp53™™ backgrounds. c-f, Insertion-site end-point PCR was 
used to demonstrate the relative clonality of insertions between samples. Data 
for medulloblastoma in five mice are shown (mouse 143, left; and four mice, 
right). Three levels of input DNA were used for each sample (1X, 5 and 25x, 
with the increase depicted by a wedge). Shown are clonal events found in both 
the primary tumour and matching metastases (met) (c), insertions that are 
highly clonal in the metastases but very subclonal in the matching primary 
tumour (d), insertions that are highly clonal in the metastases but undetectable 
in the matching primary tumour (e), and insertions that are highly clonal in the 
primary tumour but undetectable in the matching metastases (f). NC, negative 
control; genomic DNA from a Math1-SB11/T2Onc double-transgenic mouse 
cerebellum. 


primary tumour but that are clonal in the metastases could correspond 
to the previously described ‘metastasis virulence’ genes”, which offer a 
genetic advantage during dissemination but not to the primary 
tumour. Another explanation for our data could be that the primary 
tumour was reseeded by a metastatic clone that had acquired addi- 
tional genetic events in the periphery. This hypothesis is mitigated by 
the presence of highly clonal insertions in the metastasis that are com- 
pletely absent from the primary tumour in the same animal’® (Fig. 2e). 
As reseeding should be accompanied by contamination of the primary 
tumour with events found in the metastases, the absence of these events 
in the matched primary tumour makes reseeding much less likely 
(Fig. 2e). We propose that events found in only one metastasis repres- 
ent progression events that are acquired post metastasis and that could 
lead to localized progression of metastatic disease, as is sometimes 
observed in human children. 

We observed highly clonal insertions in the primary tumour, 
including in known medulloblastoma oncogenes such as Notch2 and 
Tert, that were not found in the matching metastases (Fig. 2f). This 
pattern could be explained through remobilization of the SB transposon 
in the metastatic tumour; however, no signs of the DNA footprint 
remaining after SB remobilization at these loci were observed'® 
(Supplementary Fig. 10). We suggest that these events, which may con- 
stitute driver events in the primary tumour, have arisen in the primary 
tumour after the metastases have disseminated (post-dispersion events). 
Although these known oncogenes are attractive targets for therapy, 
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their utility as targets may be limited if the targets are not also found 
in the leptomeningeal compartment of the disease. Our data from two 
separate mouse lines support a model in which medulloblastoma 
disseminates early from a restricted subclone of the primary tumour 
and in which the primary tumour and the matched metastases then 
undergo differential clonal selection and evolution. Failure to account 
for the differences between the primary and leptomeningeal compart- 
ments could lead to the failure of targeted therapies. Failure to study the 
leptomeningeal disease (Fig. 2d, e) could result in systematically over- 
looking crucial targets for therapy in this compartment. 

Examining the met-gCISs using Gene Set Enrichment Analysis 
(GSEA) demonstrated differences between the primary and metastatic 
disease, including enrichment for genes involved in the cytoskeleton in 
metastases (Supplementary Table 12). Targets that are present in both 
compartments and are maintenance genes, as exemplified by Pdgfra, 
will be optimal targets for treating both the primary tumour and the 
metastases (Fig. 2c and Supplementary Tables 7 and 9). 

Pten, Akt2, Igf2 and Pik3rl are all met-gCISs, implicating the 
phosphatidylinositol-3-OH kinase (PI(3)K) pathway in medulloblas- 
toma progression. We injected the cerebellum of Nestin-TVA mice’” 
with either an Shh-overexpressing retroviral vector (denoted Shh 
virus) or an Shh- and Akt-overexpressing retroviral vector (denoted 
Shh + Akt virus). Cerebellar injection of Shh virus alone resulted in 
medulloblastomas in 6 of 41 animals, compared with 20 of 42 animals 
injected with Shh + Akt virus (P= 0.0018). Although metastases 
were not observed with Shh virus alone (0 of 41), medulloblastoma 
metastases were observed in 9 of 42 animals injected with Shh + Akt 
virus (P = 0.0024) (Supplementary Fig. 11). In vivo modelling vali- 
dates PI(3)K signalling as a putative contributor to leptomeningeal 
dissemination of medulloblastoma. 

Previous publications and clinical approaches to human medullo- 
blastoma have largely assumed that the primary tumour and its 
matched metastases are highly similar**. To test this assertion, we 
formally reviewed all cases of medulloblastoma from the past decade 
at The Hospital for Sick Children, in Ontario, Canada, and we iden- 
tified 19 patients who had bulk residual primary tumour after surgery 
and metastases visible by magnetic resonance imaging, both of which 
could be followed for response to treatment (Supplementary Fig. 12 
and Supplementary Table 13). Although it is possible that the meta- 
stases received less radiotherapy than the primary tumour in a subset 
of patients, in 58% of all cases (11 of 19) we observed a disparate 
response to therapy between the primary tumour and the matched 
metastases (binomial test, P< 2.2 X 10 '°). Identification of definitive 
differences in the clinical response to standard therapy between the 
primary and the metastatic compartment awaits the completion of 
large, well-controlled, prospective clinical trials. 

We examined seven matched primary and metastatic medulloblas- 
tomas for copy number aberrations (Fig. 3, Supplementary Figs 13 and 
14, and Supplementary Tables 14 and 15). In each case, the primary 
tumour and the matched metastases shared complicated genetic events 
that provide strong support for their descent from a common trans- 
formed progenitor cell. Similar to our mouse data, in each case we 
observed clonal genetic events in the metastatic tumour(s) that were 
not present in the matched primary tumour (Fig. 3 and Supplementary 
Fig. 14). We also observed genetic events in the primary tumour that 
were absent from the matched metastases, consistent with a post- 
dispersion event (Fig. 3 and Supplementary Fig. 14). One patient with 
multiple leptomeningeal metastases had a deletion of chromosome 1p 
in only one of three examined metastases (Fig. 3a). This pattern of 
genetic events being present in only a subset of metastases could be a 
mechanism for the emergence of therapy-resistant metastatic clones. 

We performed interphase fluorescence in situ hybridization (FISH) 
for the known medulloblastoma oncogenes MYCN and MYC on a 
collection of 17 paraffin-embedded primary and metastatic pairs of 
human medulloblastomas'**°. MYCN was amplified in three primary 
medulloblastomas but not in the matching metastases (Fig. 3b and 
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Figure 3 | Human medulloblastoma metastases are biologically distinct 
from their matched primary tumour. a, Copy number data from a primary 
medulloblastoma (MB-C-Pri) and three patient-matched metastases (MB-C- 
Met1, MB-C-Met2 and MB-C-Met3), with chromosomal regions in red 
representing genetic gain (amplification) and in blue denoting genetic loss 
(deletion). Examples of shared clonal events (red boxes) and events limited to 
one but not all metastases (black box) are shown. Chr, chromosome. 

b, Interphase FISH shows amplification of MYCN in a primary tumour but not 
the matched metastasis. Nuclei appear blue owing to 4’ ,6-diamidino-2- 
phenylindole (DAPI) staining. c, Interphase FISH for MYC demonstrates 
amplification in both the primary tumour and its matched metastases. d, Venn 
diagrams depicting the degree of overlap and discordance in promoter CpG 
methylation events and CNAs in primary medulloblastomas and their matched 
metastases, with MB-C, MB-D and MB-F and MB-H denoting different patients. 


Primary tumour 


Supplementary Fig. 15). Conversely, MYC was amplified in two 
primary tumours and their matching metastases (Fig. 3c). These data 
are consistent with MYCN amplification being a post-dispersion event, 
similar to examples in SB-driven mouse medulloblastoma, and 
strongly indicate that anti- MYCN therapeutics may lack efficacy in 
the metastatic compartment of human medulloblastoma. The possibility 
that MYCN amplicons in the metastases have been ‘lost’ over time 
cannot be excluded. 

We subsequently analysed promoter CpG methylation in these 
matched pairs and found much discordance between the primary 
tumour and matched metastases (Fig. 3d, Supplementary Figs 13 
and 16, and Supplementary Tables 16 and 17). Finally, we performed 
whole-exome sequencing on a limited set of matched primary and 
metastatic medulloblastomas and found many single nucleotide var- 
iants (SNVs) that were restricted to a single compartment (Sup- 
plementary Fig. 13 and Supplementary Table 18). The discordance 
of CNAs, promoter CpG methylation events and SNVs between the 
primary tumour and its matched metastases supports a bicompart- 
mental model for metastatic medulloblastoma. The mutational load in 
the human tumours (the combination of CNAs, CpG methylation and 
SNVs) compares favourably with the mutational load in our transposon- 
driven mouse models (in which the median number of gCISs is 25 per 
tumour; Supplementary Table 19). Validation of the individual CNAs 
that were restricted to the metastases showed that these CNAs can be 
detected in a very minor subclone of the primary tumour, in keeping 
with the relationship identified in the mouse model (Supplementary 
Fig. 17 and Supplementary Tables 20 and 21). Pathway analysis using 
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the Database for Annotation, Visualization and Integrated Discovery 
(DAVID) to compare mouse gCISs with the genes that were affected in 
the human metastases identified only one statistically significant 
shared signalling pathway: insulin signalling (P = 0.027) (Supplemen- 
tary Table 22). The known role of insulin receptor signalling in primary 
medulloblastoma”', together with the data presented here on the role of 
AKT in metastatic medulloblastoma, suggests that insulin signalling 
should be prioritized as a therapeutic target to be tested in clinical trials. 

We performed unsupervised hierarchical clustering on the CpG 
methylation data, and we found that normal cerebellar controls cluster 
away from the medulloblastomas, whereas metastases cluster with 
their matching primary tumour (Fig. 4a). However, metastases cluster 
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Figure 4 | Human medulloblastoma metastases are genetically distinct from 
their matched primary tumour. a, Profiling the methylation status of 27,578 
CpG dinucleotide sites in the human genome in a collection of human matched 
primary and metastatic medulloblastomas; the top 2,000 genes are shown. 
Unsupervised hierarchical clustering by CpG methylation pattern 
demonstrates that patient-matched metastases are more similar to each other 
than to the matched primary tumour. b, Unsupervised clustering of regions of 
copy number gain and loss demonstrates that patient-matched metastases are 
more similar to each other than to the matched primary tumour. 

c, Unsupervised hierarchical clustering of SNV data from whole-exome 
sequencing demonstrates that patient-matched metastases are more similar to 
each other than to the matched primary tumour. SNVs that are found only in 
the primary compartment or only in both examined tumours in the metastatic 
compartment are evident. Coph, cophenetic correlation coefficient. 
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closer to each other than they do to the matched primary tumour 
(z-test, P = 0.0014) (Supplementary Fig. 18). Unsupervised hierarchical 
clustering of CNA and exome SNV data uncovered the same relation- 
ships (Fig. 4b, c). Evident within the exome data are many events that 
are shared only by patient-matched metastases (that is, metastases 
froma single patient), as well as events that are restricted to the primary 
tumour, both of which are similar to the genetic patterns observed in 
mice. These three data sets support a model in which patient-matched 
human medulloblastoma metastases are epigenetically and genetically 
very similar to each other but have substantially diverged from the 
primary tumour, resulting in two different disease compartments: 
the primary and metastatic compartments. 

Our data from two mouse models, with support from initial data 
from human medulloblastoma, suggest that leptomeningeal metastases 
of medulloblastoma from a single human or mouse are genetically 
similar to each other but are highly divergent from the matched primary 
tumour, consistent with a bicompartmental model of disease. Our 
results are consistent with a model in which metastases arise from a 
restricted subclone of the primary tumour through a process of clonal 
selection in both humans and mice. That metastases might arise from a 
pre-existing minor subclone of the primary tumour through clonal 
selection was suggested more than three decades ago, but it remains a 
controversial hypothesis that might not be true of all cancers”. 
Failure to account for the divergent molecular pathology of the meta- 
static compartment may result in selection of therapeutic targets pre- 
sent in the primary tumour, which is more amenable to surgical control, 
but not the metastases, which are the more frequent cause of death. 


METHODS SUMMARY 

Generation of Math1-SB11 construct. SB11 cDNA was excised from the vector 
pCMV-SB11 and ligated into the vector J2Q-Math1 (refs 8, 26). 
Linker-mediated PCR and 454 deep sequencing. Bar-coded, linker-mediated 
PCR was performed as previously described®. Sample preparation for the 454 
sequencing and the subsequent procedures was performed as previously 
described”. 

Determination of gCISs. A chi-squared analysis was performed to determine 
whether the number of observed integration events within each transcription unit 
in the SB-driven medulloblastomas was significantly greater than expected given 
the following: the number of TA dinucleotide sites within the gene relative to the 
number of TA sites in the genome, the number of integration sites within each 
tumour, and the total number of tumours in each cohort. This gCIS analysis 
produced a P value for each of the ~19,000 mouse RefSeq genes, and Bonferroni 
correction was therefore used to adjust for multiple hypothesis testing. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Linker-mediated PCR and 454 deep sequencing. Genomic DNA was isolated 
and purified from mouse tissues with a DNeasy Blood & Tissue Kit (QIAGEN). 
The subsequent bar-coded, linker-mediated PCR was performed as previously 
described®. Sample preparation for the 454 sequencing and the subsequent pro- 
cedures was performed as previously described’’. 

PCR for SB-tagged fragments. The primers for amplifying SB-transposon inser- 
tion sites were designed based on the chromosomal location of each independent 
insertion site and its orientation to transcription. The primers at the inverted 
repeats/direct repeats (left) (IRDRL) and inverted repeats/direct repeats (right) 
(IRDRR) of the transposon were 5’-CTGGGAATGTGATGAAAGAAATAAAA-3' 
and 5’-TTGTGTCATGCACAAAGTAGATGT-3’, respectively. The input repre- 
sents genomic DNA with SB transposition, which was illustrated by SB excision 
PCR that detected the transposon post transposition’. Three points of input 
(1X, 5X and 25X) were used. The following primers were used: Pde4d-143L, 
5'-CACATAAAAACTGGACACCTAG-3’; Pdgfra-131R, 5'-CTATCATGACCA 
CACGGAAGAGAGTGAAC-3’; Dnajb11-143L, 5'-CATGAGCTATGGCACA 
GATAC-3’; Fubp1-143R, 5'-CACTAGTGCCCATGGATTAGG-3’; Ptges-143R, 
5'-CAGAACTGATAGAGGCCAAAG-3'; Irx2-25L, 5’-CAACACTTTCAGAC 
ACACATATATC-3’; Igf2-112R, 5'-GTGACCAGTGTGTATTCGTGGAATTT 
TTTGGG-3'; and Notch2-114R, 5'-CAGTGTCCAGGCAGTCATTTCAAAGA 
GTG-3’. Details about the primer design for specific insertion sites and the PCR 
protocol are available on request. 

Review of clinical cases. We systematically reviewed all cases of medulloblastoma 
seen at The Hospital for Sick Children (Toronto, Ontario) over the past ten years. 
Cases that have both metastases and post-operative residual bulky disease at the 
primary site were identified on the basis of post-operative imaging obtained within 
72h of surgery. All radiology results were reviewed by a senior neuro-oncologist 
(E.B.). Objective responses of both the primary tumour and the metastatic disease 
were measured using the standard International Society of Paediatric Oncology 
(SIOP) criteria for clinical trials of paediatric brain tumours”*. 

End-point PCR on human samples. For PCRs to confirm the deletion of the 
CDKN2A locus on chromosome 9, a genome-walking approach (GenomeWalker 
Universal Kit, Clontech, Catalogue number 638904) was taken to locate the spe- 
cific deletion region based on single nucleotide polymorphism (SNP) coordinates. 
The following primers flanking the deletion region were used: forward, 
5'-GCAATTAACCAAGACCACCCAATGGCAAG-3’; and reverse, 5'-GTAGC 
TATTGGGGAGGTTGAGAAGGAG-3’. Three points of input shown as ACTB 
(1X, 5X and 25x) were used. The PCR products were inserted into the pCR2.1TA 
cloning vector (Invitrogen), sequenced and searched against the human genome in 
the blast database to confirm the deletion. For REXO1L1 deletion on chromosome 
8, specific primers flanking the deletion region were designed based on SNP 
microarray results. The PCR products were TA-cloned and sequenced as 
described above. The following primers were used: forward, 5'-GGCTGACTC 
CCTTCTGATATAG-3’; and reverse, 5’-CAATCACTTACAGTTACTAGGC 
AC-3'. Details about the primer design and PCR protocols are available on 
request. 

Chromosomal mapping of gCISs. Chromosomal maps of gCIS-associated genes 
were obtained from the UCSC Mouse Genome Browser (assembly in July 2007). 
Each insertion site of a specific CIS was mapped to the gene with the same 
orientation as the direction of transcription (arrow in green) or the inverse ori- 
entation to the direction of transcription (arrow in red). 

Human medulloblastoma tumour specimens. All tumour specimens were 
obtained in accordance with the Research Ethics Board at The Hospital for Sick 
Children. Surgically resected, fresh frozen samples were obtained from the 
Cooperative Human Tissue Network and the Brain Tumor Tissue Bank. 

SB remobilization. Potential SB insertion sites at Fubp1, Mnatl or Igf2 in primary 
tumours from mouse numbers 143, 14 or 11 or sites at Ptges, Aofl and Notch2 in 
the matched spine metastases were tested for remobilization. The primers were 
designed to amplify each insertion site to produce approximately 300 base pairs 
(bp) with the insertion site in the middle. PCR products were either sequenced 
directly or after being TA-cloned. The resultant sequences were examined for 
‘scars’ from potential remobilization. As positive controls for the scars, primers 
were used to amplify the T2/Onc transposon in each sample®. The products were 
sequenced and examined for the scars as described above. The following primers 
were used: Aofl forward (Fw), 5'-TACTCCAGACAGTCAGTCAGTG-3’; Aofl 
reverse (Rv), 5’-TAGTTCTGCCTCATGCCACAAG-3’; Ptges Fw, 5'-ACAGAG 
AAGGCTTCAGAGCTC-3’; Ptges Rv, 5'-GGTGCTCTCTGCTGTCCAATC-3’; 
Notch2 Fw, 5'-CAAGCTTTCAAGTATAAACCACGC-3’; Notch2 Rv, 5'-GAAT 
GCATCATCCAGTGTCCAG-3’; Fubp1 Fw, 5'-AGGAACGGGCTGGTGTTAA 
AATG-3'; Fubp1 Rv, 5'-TCTAATACCATTTCCTTGGCTTGC-3’; Mnat1 Fw, 
5'-CTAACACATCAGAGTTGGACAAG-3’; Mnat1 Rv, 5'-CATGAAGACCTG 
AGAGTGCAG-3’; Igf2 Fw, 5'’-GTGATTGGTGAATGTACTCTTTCC-3’; and 


Igf2 Rv, 5'-GTGGAACACTAGATTCTGTAGTC-3’. Details about the primer 
design and PCR protocols are available on request. 

Hierarchical clustering. Agglomerative hierarchical clustering analyses were per- 
formed in the R statistical programming environment (version 2.13). The average 
linkage method was used in all cases. Because different data types were used in the 
various analyses, the metric used for clustering differed between the analyses. The 
Manhattan distance metric was used for the copy number data because the data 
were encoded as {—1, 0, 1}. The magnitudes of the CNAs were not considered, 
owing to a multitude of confounding factors, including tumour heterogeneity and 
ploidy. The Kendall rank correlation was used for the SNV frequency data because 
the data distributions were not normal. The Pearson correlation was used for the 
methylation data, which were normally distributed. 

Identification of CpG hypermethylation events. Human genomic DNA was 
isolated from matching primary and metastatic medulloblastomas obtained from 
Johns Hopkins University, the Virginia Commonwealth University and New York 
University. An EZ DNA Methylation Kit (Zymogen Research) was used to 
bisulphite convert 500ng each sample. The recovered DNA was profiled on 
HumanMethylation27 BeadChips (Illumina) at The Centre for Applied 
Genomics (TCAG). Subsequently 27,578 CpG dinucleotides spanning 14,495 genes 
were analysed. The probe signal intensity was corrected by using BeadStudio 3.2.0 
software (Illumina). The background normalization and differential methylation 
analyses were performed against fetal cerebella using the custom error model 
(Illumina). Cancer-specific DNA hypermethylation events were defined as those 
with a 30% increase in methylation in at least one medulloblastoma sample relative 
to an average methylation level (less than 50%) in normal fetal and adult cerebellum 
samples. Unsupervised clustering using Euclidian hierarchical clustering metrics 
was then performed on 2,503 data points that were filtered for cancer-specific 
hypermethylation events. The CpG methylation data are available from the Gene 
Expression Omnibus under accession number GSE34356. 

Bisulphite sequencing of CpG promoter methylation. Representative examples 
of primary-tumour- and metastasis-specific methylation events were identified 
from normalized Illumina Hg27 data. Bisulphite PCR (BSP) primers were 
designed using the EpiDesigner tool (SEQUENOM) (http://www.epidesigner. 
com/) to encompass a genomic region flanking the Illumina Hg27 gene-specific 
probe. DNA (500ng) from the primary tumour and the corresponding metastases 
was bisulphite converted using an EZ DNA Methylation Kit. Following PCR 
optimization, 10ng bisulphite-converted DNA was used to amplify the genomic 
regions of interest. Amplicons were subcloned into the pCR2.1-TOPO vector 
(Invitrogen), and plasmid DNA from 10-12 colonies was extracted using a 
PureLink Quick Plasmid Miniprep Kit (Invitrogen). Sequencing was performed 
at TCAG using the M13 reverse primer, 5'-CAGGAAACAGCTATGAC-3’. The 
following primers were also used: MLH1 Fw, 5'-TTGTTGGAATGTTATTTAT 
TATTTAGGA; MLHI1 Rv, 5’-CATAATATCCACCAAAAAACCAAAA-3’; 
MRPS21 Fw, 5'-TTTTTGGTTTTTGTTGATTGTTTTT-3’; MRPS21 Rv, 5’-CAA 
ATCTCAAAAAATCTATCCTTTCC-3’; RBP1 Fw, 5'-GTAGGGGAGGTATAG 
GTAGGTTGTG-3’; RBP1 Rv, 5'-CTTAATCAAACCCCCTAAACAAAAA-3’; 
WNK2 Fw, 5’-GTGTTTTTGGTTTATAGAGATGGA-3’; and WNK2 Rv, 5’-AC 
TCCTCCTAATCCRACTCTAC-3’. Details about the primer design and PCR 
protocols are available on request. 

Alignment and variant calling for whole-exome sequencing. Standard manu- 
facturers’ protocols were used to perform target capture with a TruSeq Exome 
Enrichment Kit (Illumina) and sequencing of 100-bp paired-end reads on a HiSeq 
sequencing system (Illumina). Approximately 10 gigabases of sequence was 
generated for each subject such that >90% of the coding bases of the exome 
defined by the Consensus CDS (CCDS) project were covered by at least ten reads. 
Adaptor sequences and quality trimmed reads were removed by using the FASTX- 
Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), and then a custom script was 
used to ensure that only read pairs with both mates present were subsequently 
used. Reads were aligned to Hg19 with BWA1, and duplicate reads were marked 
using Picard (http://picard.sourceforge.net/) and excluded from downstream ana- 
lyses. SNVs and short insertions and deletions (indels) were called using SAMtools 
(http://samtools.sourceforge.net/) Pileup and varFilter2 with the base alignment 
quality (BAQ) adjustment disabled and were quality filtered to require at least 20% 
of reads supporting the variant call. Variants were annotated using both 
ANNOVAR3 and custom scripts to identify whether they affected protein coding 
sequence and whether they had previously been seen in dbSNP131, the 1,000 
Genomes pilot release (November 2010) or in approximately 160 exomes that 
had previously been sequenced at our centre. 

SNV analysis of whole-exome sequencing data. For clustering analysis, an SNV 
frequency matrix was constructed by calculating frequencies from the read counts of 
the reference and the alternative nucleotide. The matrix was not standardized (that is, 
converted to z scores) before clustering, because the absolute SNV frequencies were 
of interest. 
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For Venn analysis, the samples were grouped into primary—metastasis sets, and the 

filtered SNVs were used to identify SNVs that are enriched in one sample compared 
with all other samples of the same set, as determined by the hypergeometric test (P 
value threshold = 0.05). For sets consisting of three or more samples (A, B and C), an 
SNV was considered to be enriched in samples A and B if the SNV was enriched in A 
compared with C alone and also enriched in B compared with C alone. SNVs that 
were not enriched in any sample or subset of samples were considered to be common 
SNVs. Many of these common SNVs probably represented germline SNVs specific to 
the patient. 
Analysis of CpG promoter methylation data. The similarities between the 
patient-matched metastatic and primary tumour samples and among patient- 
matched metastatic tumour samples were determined by using Pearson correla- 
tion analysis. As Pearson’s r values are not normally distributed, they were 
standardized by Fisher’s z transformation. Subsequently, the correlations between 
the metastatic samples and the matched primary tumour samples were compared 
with the correlations among the patient-matched metastatic samples, using the 
paired heteroscedastic Student’s t-test. 

Clustering analysis was performed as described above. The methylation matrix 
was not standardized before clustering, as doing so would entail discarding crucial 
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information on the differences in the overall methylation profiles among samples 
or the average methylation among CpG promoters. 

The stability of the CpG hypermethylation profile clusters was assessed using 
three methods. First, the clustering analysis was run for different numbers of CpG 
hypermethylation sites that varied most widely among samples. The partitions 
generated by each clustering run were compared with the reference partitions 
generated by the original clustering based on the 1,000 most variable hypermethy- 
lated CpG islands using the Jaccard similarity index. The same analysis was 
applied to a set of 100 background hypermethylation data matrices in which the 
sites are permuted independently in each sample. Second, the clustering analysis 
was performed for random subsamples of 1,000 sites, for 1,000 repeat runs. In each 
run, the resultant cluster was compared with the original cluster using the Jaccard 
index. Analysis on the original data matrix was compared with a set of 100 
background matrices, permuted as described above. Third, the cluster stability 
was further assessed by bootstrap resampling of the samples using the pvclust R 
package (version 1.2). 


28. Gnekow, A. K. Recommendations of the brain tumor subcommittee for the 
reporting of trials. Med. Pediatr. Oncol. 24, 104-108 (1995). 
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IDH mutation impairs histone demethylation and 
results in a block to cell differentiation 


Chao Lu’, Patrick S. Ward>?, Gurpreet S. Kapoor?, Dan Rohle*®, Sevin Turcan*, Omar Abdel-Wahab*°, Christopher R. Edwards’, 
Raya Khanin®, Maria E. Figueroa’, Ari Melnick’, Kathryn E. Wellen”, Donald M. O’Rourke*"”, Shelley L. Berger’, 
Timothy A. Chan‘, Ross L. Levine*®, Ingo K. Mellinghoff**"' & Craig B. Thompson! 


Recurrent mutations in isocitrate dehydrogenase 1 (IDH1) and 
IDH2 have been identified in gliomas, acute myeloid leukaemias 
(AML) and chondrosarcomas, and share a novel enzymatic property 
of producing 2-hydroxyglutarate (2HG) from a-ketoglutarate’©. 
Here we report that 2HG-producing IDH mutants can prevent 
the histone demethylation that is required for lineage-specific 
progenitor cells to differentiate into terminally differentiated cells. 
In tumour samples from glioma patients, IDH mutations were 
associated with a distinct gene expression profile enriched for genes 
expressed in neural progenitor cells, and this was associated with 
increased histone methylation. To test whether the ability of IDH 
mutants to promote histone methylation contributes to a block in 
cell differentiation in non-transformed cells, we tested the effect of 
neomorphic IDH mutants on adipocyte differentiation in vitro. 
Introduction of either mutant IDH or cell-permeable 2HG was 
associated with repression of the inducible expression of lineage- 
specific differentiation genes and a block to differentiation. This 
correlated with a significant increase in repressive histone methyla- 
tion marks without observable changes in promoter DNA methyla- 
tion. Gliomas were found to have elevated levels of similar histone 
repressive marks. Stable transfection of a 2HG-producing mutant 
IDH into immortalized astrocytes resulted in progressive accu- 
mulation of histone methylation. Of the marks examined, increased 
H3K9 methylation reproducibly preceded a rise in DNA methyla- 
tion as cells were passaged in culture. Furthermore, we found that 
the 2HG-inhibitable H3K9 demethylase KDM4C was induced 
during adipocyte differentiation, and that RNA-interference suppres- 
sion of KDM4C was sufficient to block differentiation. Together 
these data demonstrate that 2HG can inhibit histone demethylation 
and that inhibition of histone demethylation can be sufficient to 
block the differentiation of non-transformed cells. 

The fact that IDH mutations were identified in multiple cancers 
with disparate tissues of origin suggests that 2HG-producing mutant 
enzymes probably affect some fundamental cellular processes that 
facilitate tumour progression. To study the effects of IDH mutations, 
we collected and performed gene expression microarray analysis on 
tumour specimens from patients with grade II-III oligodendroglioma. 
Sequencing results revealed a high frequency of IDH mutations in 
oligodendroglioma (33 of the samples had the R132 IDH1 mutation, 
2 had the R172 IDH2 mutation and 6 were wild type for IDH1/2). 
Supervised analysis found a statistically enriched gene signature in 
IDH-mutant samples (q value <10%, fold change >2; Fig. la and Sup- 
plementary Table 1) that was independent of tumour grade and recur- 
rence status and survived multiple testing corrections. Gene-ontology 
analysis identified the regulation of astrocyte and glial differentiation 


as the top two functional categories enriched in differentially expressed 
genes (Supplementary Table 2). We previously reported that IDH 
mutation may promote leukaemogenesis by expanding the haemato- 
poietic progenitor cell population and impairing haematopoietic dif- 
ferentiation’, and that such a phenotype could be attributed at least in 
part to mutant IDH-induced inhibition of TET2, an «-ketoglutarate 
(aKG)-dependent enzyme potentially involved in DNA demethyla- 
tion’*®. Although DNA hypermethylation has been associated with 
IDH mutation in glioma samples”, no mutations in TET family mem- 
bers have been found in this disease. We explored the possibility that 
IDH mutation may affect additional «KG-dependent enzymes that 
contribute to the regulation of cell differentiation. 

Histone lysine methylation is an integral part of the post- 
translational modifications of histone tails that are important for 
chromatin organization and regulation of gene transcription’®’. In 
vitro 2HG can competitively inhibit a family of «KG-dependent 
Jumonji-C domain histone demethylases (JHDMs)'*"*. To determine 
whether IDH-associated changes in histone methylation could be 
observed in cells, we ectopically expressed wild-type or mutant 
IDH1 or IDH2 in 293T cells and found that mutant IDH1 or IDH2 
led to a marked increase in histone methylation compared to the wild- 
type enzymes. Transient transfection of wild-type IDH2 can also lead 
to increased 2HG production’. In all of the samples, the magnitude of 
increase in methylation correlated with the intracellular 2HG levels 
produced by IDH transfection (Fig. 1b and Supplementary Fig. 1). To 
test whether histone lysine methylation was dysregulated in gliomas 
with IDH mutation, immunohistochemistry analysis of patient oligo- 
dendroglioma samples was performed for several well-characterized 
histone marks. Compared to tumours with wild-type IDH, there was a 
statistically significant increase in the repressive trimethylation of 
H3K9 (H3K9me3) and an increasing trend in trimethylation of 
H3K27 (H3K27me3) in tumours with IDH1 mutation (Fig. 1c). No 
statistically significant difference was seen in trimethylation of H3K4 
(H3K4me3), a mark associated with active transcription (data not 
shown). These data suggested that IDH mutations might preferentially 
affect the regulation of repressive histone methylation marks in vivo. 

As IDH mutations were associated with glial tumours of the ‘pro- 
neural’ phenotype”®, we sought to determine whether the persistence of 
histone repressive marks promoted by mutant IDH was sufficient to 
block the differentiation of non-transformed cells. Upon stimulation 
with a differentiation cocktail, immortalized murine 3T3-L1 cells 
undergo extensive chromatin remodelling, resulting in their matura- 
tion into adipocytes'’. 3T3-L1 cells transduced with R172K mutant 
IDH2 produced 2HG whereas cells transduced with either wild-type 
IDH2 or vector alone did not (Fig. 2a). All three cell types were then 
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Figure 1 | IDH mutations are associated with dysregulation of glial 
differentiation and global histone methylation. a, Heatmap representation of 
a two-dimensional hierarchical clustering of genes identified as differentially 
expressed between IDH mutant (mut) patient oligodendroglioma samples and 
IDH wild-type (WT) samples. Each row represents a gene and each column 
represents a specimen. IDH mutational status, tumour grade and recurrence of 
each sample are listed. b, 293T cells transfected with empty vector (Vec), wild- 
type or R132H mutant IDH1, or wild-type or R172K mutant IDH2 for 3 days 
were lysed and assessed for expression levels of IDH1, IDH2 and histone lysine 


induced to differentiate into adipocytes. After 7 days of differentiation 
induction, IDH mutant cells had visibly reduced lipid droplet accu- 
mulation compared to vector and IDH wild-type cells, as shown by 
Oil-Red-O staining (Fig. 2b). In separate experiments, stable transfec- 
tion of R140Q mutant IDH2 also resulted in a block to adipocyte 
differentiation (data not shown). To determine whether 2HG was 
sufficient to mediate the effect of mutant IDH on cell differentiation, 
we synthesized cell-permeable 1-octyl-D-2-hydroxyglutarate (octyl- 
2HG; Supplementary Fig. 2). Treatment of 3T3-L1 cells with octyl- 
2HG led to a dose-dependent inhibition of lipid accumulation (Fig. 2c). 
Gene expression analysis showed that despite exposure to a well- 
standardized differentiation protocol'*, IDH mutant cells or cells 
treated with octyl-2HG exhibited a profound defect in the expression 
of transcription factors essential for executing adipogenesis (Cebpa and 
Pparg) and an adipocytic lineage-specific gene (Adipoq) (Fig. 2d, e), 
suggesting that these cells failed to execute adipocyte differentiation. 

Cells were harvested for a chromatin immunoprecipitation (ChIP) 
assay using antibodies against H3K9me3 and H3K27me3, before or 
after 4 days of differentiation induction. Quantitative polymerase 
chain reaction (PCR) with primers targeting promoters of Cebpa 
and Adipog revealed that at day 4 there was a statistically significant 
increase in H3K9me3 and H3K27me3 at promoters of both genes in 
IDH mutant cells (Fig. 2f). These repressive marks also showed a 
modest but significant increase at gene promoters before differenti- 
ation induction. In contrast, quantitative assessment of DNA methyla- 
tion at promoters of Cebpa and Adipog by MassARRAY failed to reveal 
any significant difference between IDH wild-type and mutant cells 
(Supplementary Fig. 3). In addition to gene-specific changes, we 
detected a global increase in H3K9 methylation and a reciprocal 
decrease in H3 acetylation (Fig. 2g and Supplementary Fig. 4). 

To determine whether IDH mutation was sufficient to induce 
enhanced repressive histone methylation in central nervous system 
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methylation by western blotting with specific antibodies. Total H3 was used as 
loading control. Quantification of band intensities is shown in Supplementary 
Fig. 1. Bottom panel provides quantification of intracellular 2HG to glutamate 
ratio (2HG/glutamate) as previously reported for these transfectants’. 

c, Immunohistochemistry staining with antibodies against H3K9me3 and 
H3K27me3 in IDH1/2 wild-type and IDH1 mutant oligodendroglioma samples 
(X40 magnification). Image quantification was performed using Metamorph 
software (see Methods) and shown in bottom panels. Error bars represent 
standard deviation (s.d.) of at least three patient samples in each group. 


(CNS)-derived cells and whether it was associated with altered neural 
gene expression, we retrovirally transduced immortalized normal 
human astrocytes (NHAs) with either wild-type or R132H mutant 
IDH1. Compared to parental cells, late-passage cells expressing 
mutant IDH exhibited elevated levels of a variety of histone methyla- 
tion marks (Fig. 3a), and this correlated with an enhanced expression 
of the neural marker nestin (Fig. 3b). IDH mutations have been asso- 
ciated with CpG-island hypermethylation’ and consistent with this we 
observed that total CpG methylation was increased in IDH mutant 
cells (Supplementary Fig. 5). Because histone repressive marks can 
promote DNA methylation and vice versa’’, we studied the temporal 
relationship of histone and DNA methylation in IDH-expressing 
astrocytes (Fig. 3c-e and Supplementary Fig. 5). The first observable 
change of the histone marks we examined was H3K9me3. H3K9me3 
levels were significantly elevated by passage 12 after cells were infected 
with mutant IDH. Changes in other histone methylation marks were 
either delayed and of lower magnitude (H3K27me3 and H3K79me2) 
or were not observed (H3K4me3). Increases in DNA methylation were 
never observed before passage 17 and the difference in DNA methyla- 
tion reached statistical significance only at passage 22. 

To test whether the IDH1 R132H mutation could interfere with 
neural differentiation in the absence of prolonged adaptation in culture, 
primary neurosphere cultures established from the brains of p16/ 
p19‘ mice were infected with a retroviral construct containing 
IDH1 R132H mutant, wild-type IDH1 or the vector alone (Supplemen- 
tary Fig. 6). After infection the cells were re-plated under conditions that 
promote astrocyte differentiation and induced to differentiate further 
by treatment with retinoic acid without further passaging. IDH mutant 
cells failed to induce expression of the astrocytic marker GFAP and 
exhibited expression of the neural marker $3-tubulin (Fig. 3f). When 
the differentiation conditions were supplemented with retinoic acid, 
enhanced expression of the astrocytic marker GFAP was observed in 
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Figure 2 | Differentiation arrest induced by mutant IDH or 2HG is 
associated with increased global and promoter-specific H3K9 and H3K27 
methylation. a, 3T3-L1 cells stably expressing empty vector, wild-type, or 
R172K mutant IDH2 were lysed and assessed for expression levels of IDH2 or 
IDH1 by western blotting. Cells were also extracted for intracellular 
metabolites, which were then MTBSTFA-derivatized (see Methods) and 
analysed by gas chromatography—mass spectrometry (GC-MS). The 
quantification of 2HG signal intensity relative to the intrasample glutamate 
signal is shown for a representative experiment. b, Cells were induced to 
differentiate into mature adipocytes for 7 days. The accumulation of lipid 
droplets was assessed by Oil-Red-O staining. Wells from a representative 
experiment from a total of four independent experiments are shown. c, 3T3-L1 
cells were induced to differentiate for 7 days in the absence or presence of 1 mM 
or 2mM octyl-2HG. Oil-Red-O staining was performed and quantified by 
measuring absorbance at 500 nm. DMSO, dimethylsulphoxide. d, Vector, wild- 
type or R172K mutant IDH2 transduced 3T3-L1 cells were induced to 
differentiate for 4 days. At days 0 and 4 (d0 and d4), RNA was extracted. Relative 


IDH wild-type and vector cells but GFAP expression remained 
repressed in IDH mutant cells. 

The enhancement of H3K9 methylation in mutant IDH-expressing 
cells from multiple tissues of origin led us to investigate whether this 
H3K9 methylation might be sufficient to block the ability of non- 
transformed cells to execute differentiation. Support for this hypothesis 
came with the discovery that KDM4C (also known as JMJD2C), 
an H3K9-specific JHDM, was induced in 3T3-L1 cells during differ- 
entiation (Fig. 4a). An in vitro histone demethylase assay with recom- 
binant human GST-tagged KDMA4C confirmed that KDMA4C effec- 
tively removed H3K9me2 and H3K9me3 in the presence of “KG. 
Importantly, the demethylation reaction was inhibited by 2HG in a 
dose-dependent manner (Fig. 4b and Supplementary Fig. 7). Given 
the similarities between 2HG and «KG, the inhibition of KDM4C by 
2HG would be predicted to be competitive. Consistently, increasing the 
concentration of «KG in the reaction mixture reversed the inhibition of 
H3K9 demethylation by 2HG (Fig. 4c). 

Finally, to test the possibility that H3K9 demethylation is a required 
component of adipocyte differentiation, we examined whether blocking 


expression of adipocyte-specific gene and transcription factors was assessed by 
quantitative PCR with reverse transcription (RT-qPCR). e, 3T3-L1 cells were 
induced to differentiate for 4 days in the absence or presence of 1 mM or 2mM 
octyl-2HG. RNA was extracted. Relative expression of adipocyte-specific gene 
and transcription factors was assessed by RT-qPCR. f, Vector, wild-type or 
R172K mutant IDH2 transduced 3T3-L1 cells were induced to differentiate. At 
days 0 and 4 (d0 and d4), ChIP analysis was performed using antibodies against 
H3K9me3 and H3K27me3. Immunoprecipitated Cebpa and Adipog promoter 
sequences were analysed by qPCR and shown as percentage of input. g, Vector, 
wild-type or R172K mutant IDH2 transduced 3T3-L1 cells were induced to 
differentiate for 4 days. At days 0 and 4 (dO and d4), histones were acid- 
extracted and levels of H3K9me3, H3K9me2 and acetyl-H3 were assessed by 
western blotting with specific antibodies. Total H3 was used as loading control. 
Quantification of band intensities is shown in Supplementary Fig. 4. In f, error 
bars indicate s.d. from triplicate wells and a representative experiment from a 
total of two is shown. For all other experiments, error bars indicate s.d. from 
three independent experiments. *P < 0.05; **P < 0.01; NS, not significant. 


the induction of KDM4C was sufficient to impair the differentiation of 
3T3-L1 cells. Treatment with three independent short interfering 
RNAs (siRNAs) against KDM4C reduced its expression and enhanced 
H3K9me3 in 3T3-L1 cells (Fig. 4d and Supplementary Fig. 8). After 
differentiation induction, cells treated with KDM4C siRNAs exhibited 
reduced ability to differentiate into adipocytes. Thus the inability to 
erase repressive H3K9 methylation can be sufficient to impair the 
differentiation of non-transformed cells. 

Biochemical studies suggest that 2HG is a universal inhibitor of 
JHDM family members’*”’; therefore it was interesting to observe 
that H3K9 demethylation seemed to be more sensitive to mutant 
IDH-induced suppression than at least some other histone methyla- 
tion marks. Future investigation of the sensitivity to 2HG inhibition 
among JHDM family members and/or cellular feedback mechanisms 
activated after defective histone demethylation will be needed. In addi- 
tion to the data presented here, evidence is mounting for a direct role 
of histone methylation in stem cell maintenance, differentiation and 
tumorigenesis” *? (see Supplementary Discussion). Our findings 
support a role for ~«KG-dependent demethylases in cell differentiation 


00 MONTH 2012 | VOL 000 | NATURE | 3 


©2012 Macmillan Publishers Limited. All rights reserved 


= 
Mm 
4 
= 
m 
b= =] 


a a 
a e Late b ec Late 
gs passage g passage 
& WT R132H a WTRIS2H 
H3Kome3| =| = aeons ic 
oO 
H3K4me3 --— 3 Nestin —|3 
[os — 
H3K27me3| a james | p85 || ame came | 
ny oO 
ie) 
= 
ti’ 
c IDH1 WT IDH1 R132H 
Passage: 4 712 172227 4 7 121722 27 
wow Sess] Eee 
— 
cr ee | 
sf 
d 9, «4 IDH1 R132H * . 2.19 4 IDH1 R132H ” 
84 mIDH1 WT 49, mIDH1 WT 
7) Oe Ad r 
@ 71 x) 
3 64 < 174 
Pa Ke) 
2 54 e154 
5 4) £ 131 
34 ia) 
2 4] £& 14 
14 $ 0.94 
a 
0) sr 1 0.7 a 
4 7 12 17 22 27 4°7 12 17 22 27 
Passage Passage 
f Vec IDH1 WT IDH1 R132H 
Retinoic acid (uM): 0 0.2505 1 0 02505 1 O 02505 1 
B3-Tubulin] == zee = oe = -= 


GFAP | SP SP SE me ee ee ee 


ee 


Figure 3 | IDH mutation induces histone methylation increase in CNS- 
derived cells and can alter cell lineage gene expression. a, Immortalized NHA 
cells were retrovirally transduced with constructs containing wild-type or 
R132H mutant IDH1. Histones were acid-extracted from parental cells or cells 
expressing wild-type or mutant IDH1 at late (>40) passages. Histone lysine 
methylation levels were assessed by western blotting with specific antibodies. 
Total H3 was used as loading control. Images presented are panels from 
different areas of the same gel. b, Parental, IDH1 wild-type and R132H mutant 
NHA cells at late passages were lysed and assessed for expression levels of nestin 
by western blotting. p85 was used as loading control. Images presented are 
panels from different areas of the same gel. c, NHA cells were retrovirally 
transduced with constructs containing wild-type or R132H mutant IDH1. 
Histones were acid-extracted at different time points as cells were passaged in 
culture. Histone lysine methylation levels were assessed by western blotting 
with specific antibodies. Total H3 was used as loading control. Images 
presented are panels from different areas of the same gel. d, Western blot band 
intensities of H3K9me3 in c and two additional independent experiments were 
quantified using Image J. Red squares indicate IDH1 wild-type cells. Green 
triangles indicate IDH1 R132H mutant cells. e, Total CpG methylation of IDH1 
wild-type and R132H mutant NHA cells at various passages was measured by 
FACS using 5-methylcytosine-specific antibody and shown as normalized 
mean fluorescence intensity. FACS histograms from a representative 
experiment are shown in Supplementary Fig. 5. f, Neurosphere cultures 
established from the subventricular zone of brains of p16/p19’~ mice were 
infected with a retroviral construct containing IDH1 R132H mutant, wild-type 
IDH1 or the vector alone and induced to differentiate. GFAP and §3-tubulin 
expression levels were assessed by western blotting. p85 was used as loading 
control. In d and e, error bars indicate standard error of the mean from three 
independent experiments. *P < 0.05; **P < 0.01. 
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Figure 4 | 2HG-inhibitable H3K9 demethylase KDMA4C is required for cell 
differentiation. a, 3T3-L1 cells were induced to differentiate (Diff) for 3 days. 
Before (dO) and each day after differentiation induction, cells were lysed and 
KDM4C protein levels were assessed by western blotting with specific antibody. 
Tubulin was used as loading control. b, Bulk histones were incubated with 
purified GST-tagged human KDM4C in the reaction mix with 1 mM «KG and 
increasing concentrations of D-2HG. Levels of GST tag, H3K9me3 and 
H3K9me2 were assessed by western blotting with specific antibodies. Total H3 
was used as loading control. c, Bulk histones were incubated with purified GST- 
tagged human KDM4C in the reaction mix. 10 mM D-2HG was added to inhibit 
the demethylation reaction in the presence of increasing concentrations of 
aKG. Levels of H3K9me3 were assessed by western blotting with specific 
antibody. Total H3 was used as loading control. d, 3T3-L1 cells were transfected 
with control siRNA or siRNA specific for KDMAC. After 3 days, cells were lysed 
and assessed for expression levels of KDM4C and H3K9me3 by western blotting 
with specific antibodies. Total H3 was used as loading control. Cells from the 
same treatment were induced to differentiate for 7 days. The accumulation of 
lipid droplets was assessed by Oil-Red-O staining. Wells from a representative 
experiment from a total of three independent experiments are shown. 
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that can be impaired through the cellular accumulation of 2HG pro- 
duced by IDH mutation. 


METHODS SUMMARY 


Details about histone extraction, GC-MS and ChIP assay can be found in 
Methods. In brief, 3T3-L1 cell differentiation, Oil-Red-O staining, and in vitro 
histone demethylase assay were performed as previously described**”>. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Patient samples, microarray, gene-ontology analysis. Primary oligodendroglioma 
samples were obtained with approval from the institutional review board at the 
University of Pennsylvania and were de-identified for the study. For microarray 
analysis, tumour sample RNA was extracted with Trizol and purified with Qiagen 
RNeasy, and then assayed on an Affymetrix Human Gene 1.0ST array. Significance 
Analysis of Microarrays (http://www-stat.stanford.edu/~tibs/SAM/sam.pdf) was 
applied to find differentially expressed genes (q value <10% and fold change >2). 
Functional analysis of differentially expressed genes was done using the DAVID tool 
(http://david.abcc.ncifcrf.gov/home.jsp) using all human genes as a background set. 
3T3-L1 cell differentiation, Oil-Red-O staining. 3T3-L1 cell differentiation and 
Oil-Red-O staining were carried out as described previously. In brief, confluent 3T3- 
LI cells were stimulated with a cocktail containing 0.5 mM isobutylmethylxanthine, 
1 uM dexamethasone, 5 jig/ml insulin and 5 1M troglitazone (all from Sigma) to 
induce differentiation. Cells were maintained in medium with insulin after 2 days 
of differentiation until ready to be harvested. For Oil-Red-O staining, cells were 
washed in PBS and then fixed for 20 min at room temperature (25 °C) with 3% 
paraformaldehyde. Cells were then washed with de-ionized water and stained 
with Oil-Red-O solution. For quantification, Oil-Red-O staining was dissolved 
in isopropanol and absorbance was measured at 500 nm. 

In vitro histone demethylase assay. The histone demethylase assay was carried 
out as described previously”. In brief, 4 ug bulk calf thymus histones (Sigma) were 
incubated with GST-tagged KDMAC (1.42 ug; BPS Bioscience) in a reaction mix 
containing 50mM Tris-HCl pH 8.0, protease inhibitors cocktail, 1 mM «KG, 
100 tM FeSO, and 2 mM ascorbic acid at 37 °C for 4h, in the absence or presence 
of various concentrations of D-2HG or L-2HG (Sigma). Reaction mixtures were 
analysed by western blotting using specific antibodies. 

Cell culture, transfection and transduction, generation of cell lines. 293T cells, 
NHA cells immortalized by E6/E7/hTERT (provided by R. Pieper’) and 3T3-L1 
cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM; Invitrogen) 
with 10% fetal bovine serum (FBS; CellGro). For expression of wild-type and 
mutant IDH1/2 in 293T cells, transfection was performed with Lipofectamine 
2000 (Invitrogen) according to the manufacturer’s instructions. For generation 
of IDH2 retrovirus and transduction of 3T3-L1 cells, supernatant from 293T cells 
transfected with pCL-Eco helper virus and plasmids was collected after 72h, 
filtered and applied to cells overnight. For generating 3T3-L1 cell lines with 
stable expression of wild-type or mutant IDH2, cells were grown in 2.5 1g ml? 
puromycin for 7 days after retroviral transduction. Pooled populations of 
puromycin-resistant cells were obtained, and then continuously cultured in 
puromycin. For generation of IDH1 retrovirus and transduction of NHA cells, 
GP2-293 cells (Clontech, 631458) were calcium phosphate transfected with equal 
amounts of pVSV-G (Clontech, 631512) and plasmids. Virus was harvested at day 
2 and day 3 after transfection and placed on logarithmically growing cells. After 
infection, cells were placed in 800 pg ml * G418 (Invitrogen) to generate stable 
cell lines. For siRNA knockdown of KDMAC, transfections were performed with 
Lipofectamine RNAiMAX (Invitrogen), using siRNAs targeting KDM4C 
(#1: sense, 5’-GCUUGAAUCUCCCAAGAUATT-3’; antisense, 5’-UAUCUU 
GGGAGAUUCAAGCTT-3’; #2: sense, 5’-CAAAGUAUCUUGGAUCAAATT-3’; 
antisense, 5’- UUUGAUCCAAGAUACUUUGCC-3’; #3: sense, 5'-GAGGAGUU 
UCGGGAGUUCAACAAAU-3’; antisense, 5’-AUUUGUUGAACUCCCGAA 
ACUCCUC-3’) or a non-targeting control (Dharmacon, #D-001810-01-20) at a 
concentration of 40 nM. 

Mutational analysis. For IDH mutation analysis, tumour genomic DNA was 
extracted and the regions surrounding IDH1 codon 132 and IDH2 codons 140 
and 172 were amplified by PCR followed by sequencing. IDH1 analysis used forward 
primer 5'-ACCAAATGGCACCATACGA-3’ and reverse primer 5'-TTCATACC 
TIGCTTAATGGGTGT-3’ for amplification, and primer 5’-CGGTCTTCAGAG 
AAGCCATT-3’ for sequencing’. IDH2 analysis used forward primer 5’-CAG 
AGACAAGAGGATGGCTAGG-3' and reverse primer 5'-GTCTGCCTGTG 
TTGTTGCTTG-3’ for amplification, and the same forward primer for sequen- 
cing”’. Out of the 42 tumours analysed, 41 had sufficient high quality genomic 
DNA for discerning IDH mutation status. The one sample unable to be classified as 
either IDH wild type or mutant was excluded from further analysis. 

Plasmid construction. The cDNA clone of human IDH1 (BC012846.1) was 
purchased from the American Type Culture Collection in pCMV-Sport6, and 
human IDH2 (BC009244) was purchased from Invitrogen in pOTB7. Standard 
site-directed mutagenesis techniques were used to generate IDH1 R132H by 
introducing a G395A base-pair change in the IDH1 open reading frame (ORF). 
IDH2 R172K was made by introducing a G515A change in the IDH2 ORF, while 
IDH2 R140Q was made with a G419A alteration. Wild-type and mutant sequences 
were then subcloned into LPC vector. All sequences were confirmed by direct 
sequencing before expression in 293T cells and retrovirus generation. Retroviral 
constructs used for neurosphere infection were generated by excising wild-type 


IDH1 and IDH1 R132H with NotI and Pacl restriction enzymes from the previ- 
ously made vectors and incorporating into the pQCXIH (Clontech, 631516) 
retroviral vector. 

Histone extraction and western blotting. For histone acid extraction, cells were 
lysed in hypotonic lysis buffer (10 mM HEPES, 10mM KCl, 1.5mM MgCh, 
0.5mM DTT, protease inhibitors) for 1h. H2SO,4 was added to 0.2 N overnight 
at 4°C with rotation. After spinning down and collecting supernatant, proteins 
were precipitated in 33% TCA, washed with acetone, and resuspended in 
de-ionized water. For whole-cell lysates, cells were lysed and sonicated in standard 
RIPA buffer (1% sodium deoxycholate, 0.1% SDS, 1% Triton X-100, 0.01 M Tris 
pH 8.0 and 0.14 M NaC)), and lysates were then centrifuged at 14,000g at 4 °C for 
10 min. Supernatants were collected and measured for total protein concentration. 
For western blotting, lysates were separated by SDS-PAGE, transferred to 
nitrocellulose membrane, blocked in 5% non-fat milk in PBS containing 0.5% 
Tween-20, probed with primary antibodies and detected with horseradish- 
peroxidase-conjugated anti-rabbit or anti-mouse antibodies (GE Healthcare, 
NA934V and NA931V). Primary antibodies used were: anti-IDH1 (Proteintech, 
12332-1-AP), anti-IDH2 (Abcam, ab55271), anti-GST tag (Millipore, 05-311), 
anti-H3K9me2 (Cell Signaling Tech, 9753), anti-H3K9me3 (Abcam, ab8898), 
anti-H3K36me3 (Abcam, ab9050), anti-H3K27me3 (Millipore, 17-622), anti- 
H3K4me3 (Millipore, 17-614), anti-H3K79me2 (Cell Signaling Tech, 9757), 
anti-KDM4C (Abcam, ab85454), anti-acetyl H3 (Upstate, 06-599), anti-H3 (Cell 
Signaling Tech, 4499), anti-tubulin (Sigma, T9026), anti-GFAP (Cell Signaling Tech, 
3670), anti-B3-tubulin (Cell Signaling Tech, 5666), anti-p85 (Millipore, 06-195), 
anti-nestin (Millipore, MAB5326), anti-B-actin (Sigma, A5316). Anti-IDH1 
R132H mutant antibody was a gift from Agios Pharmaceuticals. Quantification 
of western blot band intensity was performed using Image J software according to 
the manufacturer’s instructions. 

Metabolite extraction, GC-MS. After gentle removal of culture medium, cells 
were rapidly quenched with ice-cold 80% methanol and incubated at — 80°C for 
20 min. After sonication, extracts were then centrifuged at 14,000g for 20 min at 
4° C to remove precipitated protein and the aqueous metabolites in the supernatant 
layer were dried under nitrogen gas. For 293T cells, organic acids were further 
purified by redissolving the dried extract in de-ionized water, followed by elution 
from an AG-1 X8 100-200 anion exchange resin (Bio-Rad) in 3 N HCl after wash- 
ing with five column volumes. 

For GC-MS analysis, dried extracts were redissolved in a 1:1 mixture of 
acetonitrile and N-methyl-N-tert-butyldimethylsilyltrifluoroacetamide (MTBSTFA; 
Regis) and heated for 75 min at 70 °C to derivatize metabolites. Samples were then 
injected into an Agilent 7890A GC with an HP-5MS capillary column, connected to 
an Agilent 5975 C mass selective detector operating in splitless mode using electron 
impact ionization with ionizing voltage of —70 eV and electron multiplier set to 
1,060 V. GC temperature was started at 100 °C for 3 min, ramped up to 230 °C at 
4°Cmin ' and held for 4 min, then ramped up to 300 °C and held for 5 min. Mass 
range of 50-500 AMu was recorded at 2.71 scans per second. Identification of the 
2HG metabolite peak was confirmed using standards obtained from Sigma. The 
2HG and glutamate signal intensities were quantified by integration of peak areas. 
Quantitative real-time PCR. RNA was isolated using Trizol (Invitrogen). After 
incubating with DNase, cDNA was synthesized using Superscript II reverse tran- 
scriptase (Invitrogen). Quantitative PCR was performed on a 7900HT Sequence 
Detection System (Applied Biosystems) using Taqman Gene Expression Assays 
(Applied Biosystems). Gene expression data was normalized to 18S rRNA. 
ChIP. ChIP was performed with the Millipore Magna ChIP G kit (Millipore, 17- 
611). In brief, 2,000,000 cells were cross-linked with 1% formaldehyde for 10 min 
at room temperature. After washing with cold PBS, cells were centrifuged and 
lysed in 500 pl SDS lysis buffer for 10 min on ice. Lysate was then sonicated using 
Bioruptor sonicator (Diagenode) to shear DNA to approximately 200-600 bp. 
Samples were spun down and 50 ul of the supernatant was used for each immuno- 
precipitation overnight with magnetic beads after 10X dilution. Primary antibodies 
(3 ug per ChIP) used were: anti- H3K9me3 (Abcam, ab8898) and anti-H3K27me3 
(Millipore, 17-622). Normal rabbit IgG (Millipore, 12-370) was used as control and 
showed minimal enrichment. The next day, samples were washed in low-salt 
immune complex buffer, high-salt immune complex buffer, LiCl immune complex 
buffer and TE buffer. Histone complexes were eluted in elution buffer plus 
proteinase K for 2h at 65°C. DNA was recovered using columns. Quantitative 
PCR was performed on purified DNA samples. Primers used are: Adipog forward, 
5'-ATGGCTGAACCACACAGCTTCA-3’; reverse, 5'-AGGGGTCAGGAGA 
CCTCCCTTT-3'; Cebpa forward, 5'-CTGGAAGTGGGTGACTTAGAGG-3’; 
reverse, 5’-GAGTGGGGAGCATAGTGCTAG-3’. Data points (Ct) are converted 
to percentage of input. 

Quantitative DNA methylation analysis. Matrix-assisted laser desorption/ 
ionization time-of-flight mass spectrometry using EpiTyper by MassARRAY 
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(Sequenom) was performed on bisulphite-converted DNA extracted from 3T3-L1 
cells. MassARRAY primer design was done as previously described**”’. 
Immunohistochemistry. Immunohistochemistry detection was performed using 
Discovery XT processor (Ventana Medical Systems). The tissue sections were 
blocked for 30 min in 10% normal goat serum in 0.2% BSA/PBS, followed by 
incubation for 5h with 0.1pgml~' of the rabbit polyclonal anti-H3K9me3 
(Abcam, ab8898) or 1pgml~' rabbit polyclonal anti-H3K27me3 (Millipore, 
07-449) antibodies and incubation for 60 min with biotinylated goat anti-rabbit 
IgG (Vector labs, PK6101) at 1:200 dilution. The detection was performed with 
the DAB-MAP kit (Ventana Medical Systems). The entire slides were scanned 
by Zeiss Mirax Scan (Carl Zeiss) using a X20/0.8 objective. The scanned 
image was exported into image analysis software, Metamorph (Molecular 
Devices). The colour threshold for DAB-positive nuclei was determined 
and set for all images. Areas above the threshold for the DAB signal and 
for haematoxylin-counterstained total nuclei were measured in an automated 
fashion. The ratio between the two parameters were calculated and analysed for 
statistical significance. 

Synthesis of 1-octyl-p-2-hydroxyglutarate:. Commercial R(-)-tetrahydro-5- 
oxofuran-2-carboxylic acid (140 mg, 1.076 mmol) was dissolved in H,O (1 ml), 
cooled to 0°C and treated with 1N KOH (2.16 ml, 2.15 mmol). The resulting 
solution was stirred at this temperature for 5 min and at ambient temperature 
for 2 h. It was then concentrated to dryness under reduced pressure and dried. The 
residue was dissolved in trifluoroacetic anhydride (8 ml) at 0 °C, stirred for 30 min 
at 0°C, for 2h at room temperature, then the volatiles were evaporated under 
reduced pressure. The residue was dried and dissolved in anhydrous tetrahydrofuran 
(6 ml). Octanol (0.3 ml, 2.1 eq.) was added to the solution at 0 °C and the mixture 
was stirred for an overnight period at ambient temperature. Water was added to 
quench the reaction, and the mixture extracted with EtOAc. The combined extracts 
were dried over MgSO,, concentrated and purified by Flash chromatography 
(EtOAc:hexane 1:3 and 1:1) to give 1-octyl-p-2-hydroxyglutarate (110 mg, 39%). 
Neurosphere isolation, culture and differentiation. Six days postpartum Ink4a/ 
Arf null (p16/p19 ‘~) mice were killed, with the isolated subventricular zones 
subjected to chemical (Pronase E, Calbiochem 7433-2) and mechanical dissoci- 
ation to obtain a single-cell suspension in full neurobasal medium (Neurobasal 
medium, GIBCO 21103; B27 supplement without retinoic acid, GIBCO 12587-010; 
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Glutamax, GIBCO 35050; 20ngml~’ EGF, R&D Systems 236-EG; 20ng ml * 
basic FGF, Millipore GF003). On the next day the cells were spun down and 
re-suspended in fresh medium, and once neurospheres had formed in culture, 
the spheres were collected and chemically dissociated (Accumax, Innovative Cell 
Technologies AM105) back into single cells in fresh medium. 

One day after final infection, infected neurospheres and a non-infected control 
were placed in 400 wg ml * Hygromycin B (InvivoGen, ant-hg-1). Once selection 
was complete, isogenic cell lines maintained in full neurobasal medium were 
chemically dissociated into single cells and plated at the same density in full 
neurobasal medium with increasing concentrations of retinoic acid (Sigma- 
Aldrich, R2625). Seventy-two hours later cells were harvested, and expression of 
proteins was analysed by western blotting. 

Measurement of total CpG methylation. DNA methylation was assessed as 
previously described’. In brief, 1 X 10° NHA cells were washed with PBS and 
fixed with 2% paraformaldehyde for 10 min at room temperature and permeabilized 
with 0.5% Triton X-100 for 10 min. Cells were then treated with 2 N HCl for 20 min 
at room temperature and subsequently neutralized with 100 mM Tris-HCl, pH 8.0. 
Cells were incubated with anti-5-methylcytosine antibody (Calbiochem, NA 81) at 
1:100 dilution for 30 min at room temperature. After washing with PBS, cells were 
incubated with secondary antibody coupled with ALEXA FLUOR 488 (Invitrogen) 
for 30 min in the dark. Flow cytometry was done using Becton Dickinson Calibur 
flow cytometer and analysed using FlowJo software. 

Statistical analysis. All statistical analysis was performed using Student’s t-test 
(two-sample equal variance; two-tailed distribution). 
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hypermethylator phenotype 


Sevin Turcan'*, Daniel Rohle'**, Anuj Goenka’**, Logan A. Walsh!, Fang Fang’, Emrullah Yilmaz', Carl Campos’, 
Armida W. M. Fabius!, Chao Lu*®, Patrick S. Ward*®, Craig B. Thompson’, Andrew Kaufman!, Olga Guryanova’, Ross Levinel, 
Adriana Heguy', Agnes Viale®, Luc G. T. Morris'"’, Jason T. Huse"®, Ingo K. Mellinghoff'”?!° & Timothy A. Chan!?*'° 


Both genome-wide genetic and epigenetic alterations are fun- 
damentally important for the development of cancers, but the 
interdependence of these aberrations is poorly understood. 
Glioblastomas and other cancers with the CpG island methylator 
phenotype (CIMP) constitute a subset of tumours with extensive 
epigenomic aberrations and a distinct biology’*. Glioma CIMP 
(G-CIMP) is a powerful determinant of tumour pathogenicity, 
but the molecular basis of G-CIMP remains unresolved. Here we show 
that mutation of a single gene, isocitrate dehydrogenase 1 (IDH1), 
establishes G-CIMP by remodelling the methylome. This remodelling 
results in reorganization of the methylome and transcriptome. 
Examination of the epigenome of a large set of intermediate- 
grade gliomas demonstrates a distinct G-CIMP phenotype that is 
highly dependent on the presence of IDH mutation. Introduction 
of mutant IDH1 into primary human astrocytes alters specific 
histone marks, induces extensive DNA hypermethylation, and 
reshapes the methylome in a fashion that mirrors the changes 
observed in G-CIMP-positive lower-grade gliomas. Furthermore, 
the epigenomic alterations resulting from mutant IDH1 activate 
key gene expression programs, characterize G-CIMP-positive 
proneural glioblastomas but not other glioblastomas, and are 
predictive of improved survival. Our findings demonstrate that 
IDH mutation is the molecular basis of CIMP in gliomas, provide 
a framework for understanding oncogenesis in these gliomas, and 
highlight the interplay between genomic and epigenomic changes in 
human cancers. 

The isocitrate dehydrogenase genes [DH1 and IDH2 are mutated in 
>70% of lower-grade gliomas (grades II and III), in some glioblastomas*”, 
and in leukaemias and several other cancers®’. The most common IDH1 
mutations in glioma (>95%) result in an amino acid substitution at 
arginine 132 (R132), which resides in the enzyme’s active site. Mutation 
of IDH imparts the ability to produce 2-hydroxyglutarate (2-HG), a 
potential oncometabolite*’. Alterations in the methylation landscape 
have been shown to have important roles during oncogenesis''. CIMP 
has emerged as a distinct molecular subclass of tumours in a number of 
human malignancies, including glioblastoma'*. This phenotype is 
associated with extensive, coordinated hypermethylation at specific 
loci'*"**. In glioblastomas, G-CIMP is associated with the proneural 
subgroup of tumours and IDH mutation’. Exactly how mutant IDH 
promotes tumorigenesis and causes G-CIMP—or CIMP in any type of 
human cancer—is unknown. 

To determine whether [DH1 mutation directly causes G-CIMP, we 
used immortalized primary human astrocytes'’* and constructed 
isogenic cells expressing either mutant IDH1 (R132H), wild-type IDH1, 
or neither. These astrocytes are well characterized’*’. Introduction of 


wild-type IDH1 and the R132H IDH1 mutant resulted in equal expres- 
sion of protein (modest threefold increase) (Fig. 1a). Expression of 
mutant but not wild-type IDH1 in human astrocytes resulted in the 
production of 2-HG (Fig. 1b). To determine whether mutant IDH1 
altered the methylation landscape, we analysed genomic DNA from 
these cells using the Illumina Infinitum HumanMethylation450 platform. 
The platform provides genome-wide coverage and is both well 
validated and highly reproducible’*”. 

Previous data demonstrated that de novo DNA methylation in 
in vitro models occurs over extended periods, requiring time to ‘lock 
in’ epigenomic changes'*”°. We thus analysed the methylomes of 
astrocytes expressing mutant or wild-type IDH1 over successive 
passages (up to 50). Analysis using self-organizing maps demonstrated 
that mutant IDH1 progressively remodelled the glial methylome over 
time (Fig. lc, d), an effect that was not seen in control astrocytes. 
Expression of mutant IDH1 caused a marked increase in hypermethy- 
lation at a large number of genes, although there was a small group of 
hypomethylated genes as well (Fig. le and Supplementary Fig. 1a and 
Supplementary Table 1). Surprisingly, expression of wild-type IDH1 
also reshaped the methylome but in a manner that differed from effects 
due to expression of mutant IDH1 (Fig. 1f). Expression of wild-type 
IDH1 caused hypomethylation at specific loci, suggesting that both the 
production of 2-HG and the levels of «-ketoglutarate can affect the 
methylome. Unsupervised hierarchical clustering of the methylome 
data showed that the hypermethylated genes included both genes that 
underwent de novo methylation as well as genes that originally 
possessed low levels of methylation but subsequently acquired high 
levels of methylation (Fig. le). Control astrocytes did not undergo 
these methylome changes (Fig. 1c, d). Mutant IDH1-induced remodel- 
ling of the methylome was progressive and reproducible, and resulted 
in significant changes in gene expression (Fig. 1f and Supplementary 
Fig. 1a, Supplementary Tables 2 and 3). 

We sought to define the methylation targets of mutant IDH in 
astrocytes. Of the 44,334 CpG sites that were differentially methylated 
in mutant IDH-expressing cells, 30,988 sites were hypermethylated 
(3,141 unique genes with promoter CpG island methylation changes; 
Supplementary Table 1). Transcriptional module mapping showed 
that the genes undergoing methylation changes were highly enriched 
for polycomb complex 2 (PRC2)-targeted loci (Supplementary Fig. 1b 
and Supplementary Table 4)'**". These observations demonstrate that 
mutant IDH1 is sufficient to reshape the epigenome by altering the 
global methylation landscape. 

Lower-grade gliomas (LGGs; World Health Organization grades II 
and III) and secondary glioblastomas are biologically distinct from 
primary or de novo glioblastomas”’. Present knowledge of G-CIMP 
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Figure 1 | Introduction of mutant IDH1 into human astrocytes remodels 
the methylome. a, Expression of wild-type and mutant IDH1 (R132H) in 
immortalized human astrocytes (passage 5). b, Overexpression of mutant 
IDH1 but not wild-type (WT) IDH1 in human astrocytes leads to production of 
2-HG*. Error bars show 1 standard deviation (s.d.) (n = 2). ¢, Self-organizing 
map (SOM) analysis of methylome data for wild-type IDH1-expressing, 
mutant IDH1-expressing (R132H), and parental (control) cell lines shows 
changes in the methylome in mutant IDH1-expressing and wild-type IDH1- 
expressing astrocytes, compared to parental cells. Mosaic patterns are pseudo- 
coloured SOMs from different time points (P indicates passage number). Tile 
colours indicate methylation level of centroids. d, Hierarchical clustering 
showing divergence of the methylome of IDH1-expressing astrocytes from that 
of parental astrocytes. MUT, mutant; PAR, parental. e, Heatmap showing the 
10,678 most significant differentially methylated probes (ANOVA) in IDH1 
mutant astrocytes and parental astrocytes (passages 2 and 40). Colour scale 
indicates f values. f, Kinetics of differential methylation in mutant and wild- 
type-expressing astrocytes. Error bars indicate inter-quartile range (n = 2). 


is based on the examination of primary glioblastomas in which IDH 
mutations are infrequent'*”. To determine the impact of IDH muta- 
tion on the methylation landscape in primary LGGs, we generated a 
high-resolution, genome-wide set of LGG methylome data from 
patients with complete clinical follow-up using the same Infinium 
450K platform as described earlier (72 WHO grade II and III gliomas; 
Fig. 2 and Supplementary Table 5). We first performed consensus 
clustering (Fig. 2a and Supplementary Fig. 2a) and unsupervised 
hierarchical clustering (Fig. 2b and Supplementary Fig. 2b) to identify 
LGG subgroups. We identified two robust DNA methylation clusters, 
one encompassing tumours with markedly high methylation levels 
(cluster 2) and another without the hypermethylated loci (cluster 1). 
Cluster 2 tumours demonstrated a characteristic DNA methylation 
profile with high-coordinate cancer-specific methylation at a subset of 
loci, concordant with the G-CIMP phenotype defined in glioblastomas 
(Supplementary Fig. 2b and Supplementary Table 6)'. The composition 
of the G-CIMP group in these LGGs was confirmed by two indepen- 
dent clustering methods (K-means consensus and two-dimensional 
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Figure 2 | Global epigenetic analysis of LGGs reveals dependence of 
G-CIMP on IDH mutation. a, Identification of G-CIMP by K-means 
consensus clustering of LGG samples. Unsupervised clustering was performed 
with the most variant probes (9,711 probes, top 2%). Tumours are listed in the 
same order along the x and y axes. G-CIMP status is indicated by the black and 
white bars. Consensus index values range from 0 to 1, with 0 being dissimilar 
(white) and 1 being similar (red). K = 2 is identified by the Lorenz curve. 

b, Two-dimensional (2D) hierarchical clustering of the same probes as in 

a identified the same two clusters. Each row represents a tumour and each 
column represents a probe. CIMP and IDH mutation status are indicated by the 
colour code. The level of DNA methylation (f value) for each probe is 
represented by colour scale (red, methylated; blue, non-methylated). Only 
cancer-specific events were used”’. c, Kaplan-Meier survival curve of Memorial 
Sloan-Kettering Cancer Center (MSKCC) patients (n = 72) with LGG (grade II 
and III). d, Receiver operating characteristic (ROC) curve comparing the 
sensitivity and specificity of G-CIMP status compared with MGMT 
methylation or MGMT expression status, in LGGs. Areas under the curve are 
noted in the inset. G-CIMP, MGMT methylation and MGMT expression were 
determined as described in Methods. 


hierarchical clustering) (Fig. 2a, b). Probes defining CIMP in LGGs 
included those in CpG islands and shores (Supplementary Fig. 2c, d) 
and were enriched for PRC2-target genes (Supplementary Table 7). 
Global expression profiles showed that G-CIMP+ tumours possessed 
markedly different transcriptional profiles than G-CIMP— tumours 
(Supplementary Tables 8 and 9). EpiTYPER (Sequenom) mass 
spectrometry was used to validate the methylation status of loci in both 
the astrocyte model and in the tumours (Supplementary Fig. 2e-g)”’. 
To determine the mutational status of IDH1 and IDH2, we sequenced 
the entire coding sequence of the two genes in all the samples above 
(Fig. 2b). Ninety-eight per cent (49/50) of the G-CIMP+ tumours 
possessed either an IDH1 mutation or IDH2 mutation. Notably, none 
of the G-CIMP— tumours possessed mutant IDH (Supplementary 
Fig. 2h). These genomic data show that G-CIMP is highly dependent 
on the presence of IDH mutation and, in LGGs that are CIMP—, IDH 
mutations do not occur (0%). Currently, the methylation status of 
O-6-methylguanine DNA methyltransferase (MGMT) is a widely 
used molecular biomarker for glioblastoma prognosis and response to 
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temozolomide™*. In LGGs, G-CIMP associated with markedly better 
clinical endpoints (Fig. 2c and Supplementary Figs 3-6, Supplemen- 
tary Tables 10 and 11). Importantly, G-CIMP was significantly superior 
to MGMT methylation or MGMT messenger RNA expression as a 
predictor of survival (Fig. 2d). 

We next sought to define the nature of the methylome differences 
between IDH mutant and wild-type tumours and characterize the 
effects of these differences on the LGG transcriptome. Figure 3a shows 
a principal component analysis (PCA) of methylome and expression 
data from our tumours. PCA shows that G-CIMP+ and G-CIMP— 
LGGs methylome subgroups correlate with marked transcriptome 
differences (Fig. 3a). Of the 140,016 sites that were differentially 
methylated between IDH mutant and wild-type tumours, 121,660 
were hypermethylated (Supplementary Table 6). There were 2,611 
unique genes with alterations in promoter CpG islands represented 
in this group. Consistent with the results in Fig. 2b, a volcano plot 
showing differentially methylated genes between G-CIMP+ and 
G-CIMP— tumours was highly asymmetric (Fig. 3b). A starburst plot 
showing the relationship between DNA methylation and expression is 
shown in Fig. 3b. Integration of the normalized gene expression and 
DNA methylation gene sets identified 429 genes with both significant 
hypermethylation and downregulation and 176 genes that were 
hypomethylated and upregulated in G-CIMP+ LGGs (Supplemen- 
tary Table 12). Among these genes are those known to be involved 
in glioma initiation and outcome, including CDKN2C and GAP43 
(refs 25, 26). 

As a critical experiment to prove causality between IDH1 mutation 
and G-CIMP, we performed an in-depth comparison of methylation 
marks and gene expression alterations between human astrocytes 
expressing mutant IDH1 and the LGGs with endogenous IDH1 muta- 
tion. We first focused on the comparison of methylation marks and 
found that both sets of methylome alterations targeted similar loci. 
Gene set enrichment analysis (GSEA) of the mutant IDH1-induced 
methylation changes in the isogenic astrocyte system (Fig. 1) and the 
G-CIMP genes demonstrated very significant enrichment and con- 
cordance (Fig. 3c and Supplementary Table 13 and Supplementary 
Fig. 7). Importantly, the genes that were methylated after mutant 
IDH1 expression correctly classified LGG tumours into CIMP+ or 
CIMP— groups with very high accuracy (Fig. 3d and Supplementary 
Table 14). To confirm the impact of these alterations on glioma patho- 
biology, we used the transcriptomic footprint of mutant IDH to generate 
an expression signature (mutant IDH repression signature) composed 
of the most significantly methylated and downregulated genes in both 
the isogenic astrocyte system and the G-CIMP gene set (17 genes; 
Supplementary Table 15). As expected, this signature classified an inde- 
pendent LGG cohort (Rembrandt) into two distinct subgroups (Fig. 3e 
and Supplementary Figs 8-10 and Supplementary Table 16). Together, 
our findings show that introduction of mutant IDH reprograms the 
epigenome and generates the foundations of G-CIMP. 

IDH mutation is highly enriched in the CIMP+, proneural subgroup 
of glioblastomas. Using data from The Cancer Genome Atlas (TCGA), 
we applied the mutant IDH repression signature as a classifier to the 
transcriptomes of all four subgroups of glioblastomas”’. The signature 
segregated IDH mutant and wild-type proneural glioblastomas into two 
distinct subgroups associated with very different prognoses, but did not 
do so in other glioblastoma subgroups (Supplementary Fig. 11a, b). 
These data demonstrate that mutant IDH-induced epigenomic altera- 
tions have profound biological implications within the proneural class 
of glioblastomas that are specific for this subclass. Comparison of gene 
expression programs that occur in astrocytes expressing mutant IDH1 
to those in LGG tumours that harbour the IDH mutation showed 
remarkable similarity (Fig. 4a and Supplementary Fig. 12). Moreover, 
introduction of mutant but not wild-type IDH1 into astrocytes resulted 
in the upregulation of nestin (and other genes associated with stem cell 
identity) at the time of DNA methylation increase and the adoption 
of a neurosphere/stem-like phenotype (Fig. 4b and Supplementary 
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Figure 3 | IDH1 mutation directly generates the methylation patterns 
present in G-CIMP tumours. a, The methylomes and transcriptomes of LGGs 
are distinct. PCA plot of LGG tumours for all methylation probes (left) and 
expression probes (right) (n = 52). PC, principal component. b, Starburst plot 
(left) for comparison of DNA methylation and gene expression. The logy 
(FDR-corrected P value) is plotted for / value for DNA methylation (x axis) and 
gene expression (y axis) for each gene. Black dotted line shows the FDR- 
adjusted P value of 0.05. Red points indicate downregulated and 
hypermethylated genes in G-CIMP+ LGGs versus G-CIMP— LGGs. Blue 
points show hypomethylated and upregulated genes. Volcano (right) plot of all 
CpG loci analysed for G-CIMP association. The /-value difference in DNA 
methylation between G-CIMP+ and G-CIMP— tumours is plotted along the x 
axis. The P value between G-CIMP+ and G-CIMP— tumours is plotted on the 
y axis (—logio scale). Red indicates significantly different probes. 

c, Concordance between hypermethylated sites in mutant IDH1-expressing 
astrocytes and G-CIMP+ LGGs. GSEA shows significant enrichment between 
730 hypermethylated unique CpG sites identified in IDH1 mutant astrocytes 
(ANOVA between passage 2 and 40) and those present in G-CIMP+ gliomas. 
GSEA correlation shown in colour scale. ES, enrichment score; FDR, false 
discovery rate; FEWR, familywise error rate; NES, normalized enrichment 
score; NOM, nominal P value. d, Differential methylation in IDH mutant 
astrocytes correctly classifies G-CIMP in the human LGGs. Two-dimensional- 
unsupervised hierarchical clustering of 81 human gliomas with top variant 
probes (n = 10,000) from mutant IDH1 astrocytes. Tumours are shown on the 
y axis, probes along the x axis. Methylation (f value) for each probe is 
represented with the colour scale. G-CIMP classification as determined by the 
astrocyte-derived data is denoted by the colour bars at the left. e, Kaplan-Meier 
survival curve of 115 patients with grade II or grade III gliomas in the 
Rembrandt Database grouped by CIMP status. P value calculated by log rank. 
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Figure 4 | Functional implications of IDH1-mutation-induced alterations 
in the glioma epigenome. a, Concordance of transcriptional programs 
regulated by mutant IDH1 in astrocytes and G-CIMP in LGGs. P value for 
significance is shown along the x axis. Yellow lines indicate threshold of 
significance (P = 0.05). b, Mutant IDH1 results in the expression of markers of 
self-renewal and stem cell identity. Left, mutant IDH1 results in expression of 
nestin. P indicates passage number. Right, expression of mutant IDH1 
promotes the adoption of a neurosphere phenotype. Astrocytes (passage 15) 
that express IDH R132H or IDH1 wild type were used in the neurosphere assay. 
Error bars indicate 1 s.d. **P < 0.01 (t-test). c, Alterations in histone marks in 
IDH1-mutant-expressing human astrocytes. Left, western blot results are 
shown using the indicated antibodies. Astrocytes are from passage 27. Right, 
ChIP of the indicated histone marks for representative hypermethylated genes. 
Error bars indicate 1 s.d. *P < 0.05. d, Mutant IDH1 inhibits the production of 
5hmC in human astrocytes. Left, mutant IDH inhibits TET2-dependent 5hmC 
production in astrocytes. Parental astrocytes were infected with lentivirus 
directing the expression of TET2 catalytic domain and green fluorescent 
protein (GFP) + mutant IDH1. FACS analyses are shown for 5hmC. Right, 
astrocytes expressing IDH R132H (passage 10) have less 5hmC than astrocytes 
that do not express the IDH mutant. 


Fig. 13)’*. These data suggest that mutant IDH1 functions by interfer- 
ing with differentiation state. 

Our data show that IDH1 mutation is the mechanistic cause of 
G-CIMP. To gain further insight, we determined the effects of mutant 
IDHI1 on histone alterations in our astrocyte system. Figure 4c (left) 
shows that expression of the IDH1 mutant increases levels of H3K9me2, 
H3K27me3 and H3K36me3, consistent with previous findings”. 
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Chromatin immunoprecipitation (ChIP) experiments examining 
representative genes that undergo hypermethylation show H3K9 and 
H3K27 methylation are both enriched in cells expressing mutant IDH1 
(Fig. 4c, right). As both of these marks can promote DNA methylation, 
alterations in histone marks may contribute to the accumulation of 
DNA methylation. 

Next, we determined the effects the mutation had on TET2-dependent 
5-hydroxymethylcytosine (ShmC) levels. We used a well-established 
assay””’ and first confirmed that we were able to detect TET-dependent 
alterations in 5hmC (Supplementary Fig. 14). We found that expression 
of the IDH1 mutant in astrocytes resulted in a significant decrease in 
5hmC (Fig. 4d, right). Expression of TET2 in the astrocytes produced 
5hmC, which was inhibited by mutant but not wild-type IDH1 (Fig. 4d, 
left). Because TET-mediated production of ShmC is a primary mode of 
DNA demethylation”, inhibition of this activity in the IDH1-mutant- 
expressing astrocytes may be a mechanistic basis for accumulation of 
DNA methylation, ultimately leading to a CIMP pattern. 

IDH mutation and the CIMP phenotype are two very common 
features in cancer, the underlying mechanisms for which are obscure. 
The fundamental questions regarding these features are (1) how the 
IDH mutation contributes to oncogenesis, and (2) what the root cause 
of CIMP is. Our data address these important questions by demon- 
strating that IDH mutation is the cause of CIMP and leads to the CIMP 
phenotype by stably reshaping the epigenome. This remodelling 
involves modulating patterns of methylation on a genome-wide scale, 
changing transcriptional programs and altering the differentiation 
state. Our observations suggest that the activity of IDH may form 
the basis of an ‘epigenomic rheostat’, linking alterations in cellular 
metabolism to the epigenetic state. In summary, these data provide a 
mechanistic framework for how IDH mutation leads to oncogenesis 
and the molecular basis of CIMP in gliomas. We believe our observa- 
tions have critical implications for the understanding of gliomas and 
the development of novel therapies for this disease. 


METHODS SUMMARY 


Cell culture. Immortalized human astrocytes were a gift from R. O. Pieper 
(University of California, San Francisco) and were prepared as previously 
described"*. Cells were cultured in Dulbecco’s modified Eagle’s medium 
(DMEM) plus 10% fetal bovine serum (FBS; Invitrogen). Expression of IDH 
was accomplished by cloning wild-type or mutant IDH1 (R132H) into the vector 
pLNCX2. These constructs were used to construct lentiviruses used for infection of 
target cells. Selection was performed using G418. All experiments were performed 
in duplicate. 

Tumours. All tumours (n = 81) were obtained following surgical resection at the 
MSKCC as part of routine clinical care and snap frozen. Tumours were obtained in 
accordance with Institutional Review Board policies at the MSKCC. Each sample 
was examined histologically with haematoxylin-and-eosin-stained cryostat sections 
bya neuropathologist. Before analysis, tumours were sectioned and microdissected. 
Genomic DNA or RNA was extracted using the DNeasy kit (Qiagen) or RNeasy 
Lipid Tissue Mini kit (Qiagen) per the manufacturer’s instructions. 

Genomic analysis. Expression analysis of astrocytes and tumours was performed 
using the Affymetrix U133 2.0 microarray. Genome-wide methylation analysis 
was performed using the Illumina Infinitum HumanMethylation450 bead array. 
Processing of the arrays was as per the manufacturer’s protocol. Methylation data 
were extracted using GenomeStudio software (Illumina). Methylation values for 
each site are expressed as a (3 value, representing a continuous measurement from 
0 (completely unmethylated) to 1 (completely methylated). This value is based on 
the following calculation: f# value = (signal intensity of methylation-detection 
probe) / (signal intensity of methylation- detection probe + signal intensity of 
non-methylation detection probe). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cell culture. Immortalized human astrocytes were a gift from R. O. Pieper 
(University of California, San Francisco) and were prepared as previously 
described'*. Cells were cultured in Dulbecco’s modified Eagle’s medium 
(DMEM) plus 10% fetal bovine serum (FBS; Invitrogen). Expression of IDH 
was accomplished by cloning wild-type or mutant IDH1 (R132H) into the vector 
pLNCX2. These constructs were used to construct retroviruses used for infection 
of target cells. The retroviral packaging cell line GP-293 was seeded in 10-cm- 
diameter dishes and (at 30-50% confluency) was transfected using Lipofectamine 
(Invitrogen) with pVSV-G (Clontech) and pLNCX2-IDH1 wild type or IDH1 
R132H. Retroviral particles were collected, filtered through a 0.45-um syringe 
filter and polybrene was added (8 jig ml final concentration) to infect the human 
astrocytes for 12 h. Stable transfectants were selected with G418 and pooled popu- 
lations of G418-resistant cells expressing either wild-type IDH1 or IDH1 R132H 
were confirmed by western blot analysis with anti-IDH1 antibody (rabbit anti- 
IDH1; Cell Signaling). All experiments were performed in duplicate. 

Tumours. All tumours (nm = 81) were obtained following surgical resection at the 
MSKCC as part of routine clinical care, and snap frozen. Tumours were obtained 
in accordance with Institutional Review Board policies at the MSKCC. Each 
sample was examined histologically by a neuropathologist. Before analysis, 
tumours were sectioned and microdissected. Genomic DNA or RNA was 
extracted using the DNeasy kit (Qiagen) or Triazol (Invitrogen) as per the 
manufacturer’s instructions. Data from TCGA tumours (n = 173) are publically 
available’. For the LGG validation set, expression data sets of 115 patients with 
grade II and grade III gliomas were identified from the NCI Repository for 
Molecular Brain Neoplasia Data (Rembrandt; http://rembrandt.nci.nih.gov). 
Sample preparation. DNA from wild-type IDH1, R132H IDH1 and parental 
astrocytes was extracted with the Puregene Cell and Tissue Kit (Qiagen) at various 
passages (passages 2, 5, 10, 15, 20, 25, 30, 40 and 50) and RNA was extracted with 
Trizol (Invitrogen) according to the manufacturer’s directions. All experiments 
with the astrocytes were performed in duplicate, each with two corresponding 
technical (microarray) replicates. Genomic DNA and RNA from human tumours 
were extracted from frozen primary tumours for the methylation and expression 
studies. Frozen samples were snap frozen in liquid nitrogen and stored at —80 °C. 
Each sample was examined histologically with haematoxylin-and-eosin-stained 
sections by a neuropathologist and representative sections were microdissected 
from the slides. Genomic DNA was extracted with the Qiagen DNeasy Blood and 
Tissue Kit using the manufacturer’s instructions. RNA was extracted with Qiagen 
RNeasy Lipid Tissue Mini Kit using the manufacturer’s instructions. Nucleic acid 
quality was determined with the Agilent 2100 Bioanalyzer. 

Genomic analysis. Expression analysis of astrocytes and tumours was performed 
using the Affymetrix U133 2.0 microarray (Affymetrix). Genome-wide methylation 
analysis was performed using the Infinitum HumanMethylation450 bead array 
(Illumina). Processing of the arrays was per the manufacturer’s protocol. Methylation 
data were extracted using GenomeStudio software (Illumina). Methylation values for 
each site are expressed as a /} value, representing a continuous measurement from 0 
(completely unmethylated) to 1 (completely methylated). This value is based on 
following calculation: / value = (signal intensity of methylation-detection probe)/ 
(signal intensity of methylation-detection probe + signal intensity of non- 
methylation detection probe). 

Data analysis. For methylation analysis, [lumina data were imported into Partek 
software. f Values were logit-transformed and adjusted for batch effects before 
analysis. Analysis of variance (ANOVA) with false discovery correction (FDR) was 
used to identify genes that were differentially methylated between the astrocytes 
expressing wild-type IDH1, mutant IDH1, and control astrocytes. Significant 
changes were defined as genes having an FDR-corrected P value < 0.05. In human 
tumours, unsupervised consensus clustering of the / values was performed with 
K-means clustering (Kinax = 5) with Euclidean distance and average linkage over 
1,000 resampling iterations with random restart on the top 2% of the most variant 
probes (9,750 probes) using Gene Pattern v.2.0°'. This identified an optimal number 
of K = 2 groups. This was repeated using unsupervised hierarchical clustering using 
Pearson dissimilarity. The cluster of samples that exhibited a large degree of 
hypermethylation was identified as CIMP+, and the remaining group CIMP—. 
ANOVA with FDR correction was used to identify genes that were differentially 
methylated between the CIMP groups. Significant changes were defined as genes 
having an FDR-corrected P value < 0.05. 

For gene expression analysis of astrocytes, Affymetrix CEL files were imported 
into the R statistical software (v.2.13.0; http://www.R-project.org). Normalization 
was performed with the AffyPLM package in BioConductor (v.2.4), using RMA 
background correction, quantile normalization, and the Tukey biweight summary 
method. Differential expression was detected using the limma package and P 
values were adjusted for multiple testing using the FDR approach. A probe set 
is considered differentially expressed if the FDR-adjusted P value < 0.05. For gene 


expression analysis of the human tumours, the Affymetrix data were imported into 
the Partek Genomics Suite (Partek) as Affymetrix CEL files. The data were RMA 
normalized and median-scaled for analysis. ANOVA followed by FDR was used to 
identify genes that were differentially expressed between the CIMP groups. To 
derive the 17-gene mutant IDH1 repression signature, we identified 605 unique 
genes that had either statistically significant hypermethylation at promoter- 
associated CpG islands and decreased gene expression, or had hypomethylation 
at promoter-associated CpG Islands and increased gene expression in CIMP+ 
versus CIMP— tumours. We identified common genes in a comparison of this 
gene set with that derived from mutant IDH1-expressing astrocytes versus wild 
type. Differential methylation in the cell lines was defined as an FDR-adjusted 
P value < 0.05, and differential expression was defined as a P value < 0.05 with 
concordant fold change of at least 1.5 fold. 

The 17-gene expression signature was used to predict CIMP in the Rembrandt 
data set. Unsupervised hierarchical clustering using Pearson dissimilarity identified 
two unique clusters that were categorized as ‘predicted CIMP+’ and ‘predicted 
CIMP—’. The 17-gene expression signature was also used to identify subgroups 
from the TCGA GBM data set of 173 patients’. Unsupervised consensus clustering 
using Pearson dissimilarity was performed on each of the subclasses identified by 
the TCGA to identify clusters**. Rembrandt data sets were obtained at http:// 
caintegrator-info.nci.nih.gov/rembrandt. 

Functional analysis of gene lists was performed using the PANTHER database and 
categories with adjusted P values (Benjamini-Hochberg) < 0.05 were considered as 
significantly over-represented in our gene lists*’. Concepts module mapping was 
performed as follows. The hypermethylation signature identified from our analysis 
of differentially methylated genes in IDH1 mutants compared to IDH1 wild-type 
was imported into Oncomine (http://www.oncomine.org) to identify associations 
with molecular concepts signatures derived from independent cancer profiling stud- 
ies. Statistically significant concordances of our methylation gene signature with the 
pre-defined concepts were identified and Q value was calculated as previously 
described**. 

Methylation data of parental, wild-type IDH1-expressing astrocytes, and 
mutant IDH1-expressing astrocytes were clustered using self-organizing maps 
and visualized with the Gene Expression Dynamics Inspector (GEDI; v.2.1). For 
GEDI analysis, methylation data were normalized as a group across all passages 
and genotypes. Further hierarchical clustering (average-linkage) of GEDI map 
centroids was performed in R using the hclust library in the stats package. 
GSEA was performed using GSEA software v.2.0 and MSigDB database v.2.5. 
We assessed the significance of the curated gene sets (MSigDB collection 2) 
with the following parameters: number of permutations= 1,000 and 
permutation_type = phenotype, with an FDR Q-value cut-off of 5% (ref. 35). 

G-CIMP was compared to MGMT methylation and MGMT expression in 52 
LGG samples in the MSKCC cohort. We identified the Illumina 450K methylation 
probe ID (cg12981137) that corresponded to the MGMT MSP primer sequence as 
identified previously** and the Affymetrix probe ID (204880_at) that corresponds 
to MGMT expression. 

Clinical and pathological characteristics between cohorts were compared using 

the 7” test. Overall survival was calculated from the date of surgery to death from 
any cause. Patients were censored at the time they were last known to be alive. 
Overall survival was assessed using the Kaplan-Meier method and the log-rank 
test was used for comparison between groups. Multivariate analysis was performed 
using a Cox proportional hazards model to assess the independent effect of 
prognostic variables on outcome, and using binary logistic regression to predict 
the probability of occurrence of CIMP+. An ROC curve was generated to graph 
the sensitivity and specificity of CIMP, MGMT methylation, and MGMT expres- 
sion to predict survival =3 years. MGMT methylation and MGMT expression was 
considered continuous variables, and CIMP a categorical variable (defined by 
unsupervised hierarchical analysis as described above). Patients that were alive 
and had less than 3 years of follow-up were excluded from this analysis. Data was 
analysed using SPSS software (IBM SPSS statistics version 19.0). 
Quantitative DNA methylation analysis using mass spectrometry. DNA 
methylation analysis was performed using the EpiTYPER system (Sequenom). 
The EpiTYPER assay is a tool for the detection and quantitative analysis of 
DNA methylation using base-specific cleavage of bisulphite-treated DNA and 
matrix-assisted laser desorption/ionization time-of-flight mass spectrometry 
(MALDI-TOF MS)’. For primer sequences, target chromosomal sequence, and 
EpiTYPER-specific tags, see Supplementary Table 17. SpectroCHIPs were ana- 
lysed using a Bruker Biflex III MALDI-TOF mass spectrometer (SpectroREADER, 
Sequenom). Results were analysed using the EpiTYPER Analyzer software, and 
manually inspected for spectra quality and peak quantification. CIMP positivity 
was defined as a mean methylated allelic frequency of >50% or a twofold increase 
over normal breast tissue and the CIMP- state. 
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PCR amplification and sequencing. Exonic regions for the [DH1 and IDH2 
genes (NCBI Human Genome Build 36.1) were broken into amplicons of 
500 bp or less, and specific primers were designed using Primer3. Standard M13 
tails were added to the primers to facilitate Sanger sequencing. PCR reactions were 
carried out in 384-well plates in a Duncan DT-24 water bath thermal cycler with 
10 ng of whole-genome amplified DNA (REPLI-g Midi, Qiagen) as a template, 
using a touchdown PCR protocol with KAPA Fast HotStart (Kapa Biosystems). 
The touchdown PCR method consisted of: 1 cycle of 95°C for 5 min; 3 cycles of 
95°C for 30s, 64 °C for 15 s, 72 °C for 30 s; 3 cycles of 95 °C for 30 s, 62 °C for 15s, 
72°C for 30s; 3 cycles of 95°C for 30s, 60°C for 15s, 72°C for 30 s; 37 cycles of 
95 °C for 30s, 58°C for 15s, 72 °C for 30s; 1 cycle of 70 °C for 5 min. Templates 
were purified using AMPure (Agencourt Biosciences). The purified PCR reactions 
were split into two and sequenced bidirectionally with M13 forward and reverse 
primer and the Big Dye Terminator Kit v.3.1 (Applied Biosystems) at Agencourt 
Biosciences. Dye terminators were removed using the CleanSEQ kit (Agencourt 
Biosciences), and sequence reactions were run on ABI PRISM 3730xl sequencing 
apparatus (Applied Biosystems). Sanger sequencing of IDH1 and IDH2 produced 
an average coverage of 96.1% of coding sequence nucleotides across all samples. 
Mutation detection. Passing reads were assembled against reference sequences, 
containing all coding exons including 5 kb upstream and downstream of the gene, 
using command line Consed 16.0°’. Assemblies were passed on to Polyphred 
6.02b**, which generated a list of putative candidate mutations, and to Polyscan 
3.0°°, which generated a second list of putative mutations. The lists were merged 
together into a combined report, and the putative mutation calls were normalized 
to ‘+’ genomic coordinates and annotated using the Genomic Mutation 
Consequence Calculator**. The resulting list of annotated putative mutations 
was loaded into a Postgres database along with select assembly details for each 
mutation call (assembly position, coverage, and methods supporting mutation 
call). To reduce the number of false positives generated by the mutation detection 
software packages, only point mutations that were supported by at least one 
bi-directional read pair and at least one sample mutation called by Polyphred were 
considered, and only the putative mutations that were annotated as having non- 
synonymous coding effects, occurred within 1 bp of an exon boundary, or had a 
conservation score > 0.699 were included in the final candidate list. Indels were 
manually reviewed and included in the candidate list if found to hit an exon. All 
putative mutations were confirmed by a second PCR and sequencing reaction, in 
parallel with amplification and sequencing of matched normal tissue DNA. 
ChIP. Cells were fixed with 1% formaldehyde for 10 min at room temperature 
(21°C) and formaldehyde was inactivated by the addition of 125mM glycine. 
ChIP assays were performed using a protocol recommended by the manufacturer 
of a commercially available ChIP assay kit (17-371, Millipore). Chromatin extracts 
were immunoprecipitated using anti-H3K9me3 (Ab8898, Abcam) or anti- 
H3K27me3 (07-449, Millipore) antibodies. After washing, ChIPed DNA was 
eluted from the beads and analysed on an Eppendorf Realplex using SYBR 
Green (Applied Biosystems). Relative occupancy values were calculated by deter- 
mining ratios of the amount of immunoprecipitated DNA to that of the input 
sample (2% of total). 
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Flow cytometry and 5hmC assay. HEK 293T cells were transiently transfected 
with Flag~TET2 in pCMV6-ENTRY vector with Lipofectamine 2000 (GIBCO). For 
two-colour flow cytometry, 10° cells were washed with ice-cold PBS, permeabilized 
and fixed using BD Cytoperm/Cytofix solution (BD, PharMingen), and incubated 
with anti-Flag (1:200, Sigma) and anti-S5hmC (1:400, Active Motif #39770) 
antibodies for 30 min at room temperature. Cells were washed with PBS and 
incubated with secondary antibodies conjugated with Alexa Fluor 488 or Cy5 
(Invitrogen) for 30 min in the dark. For single-colour flow cytometry, parental 
and IDH1 mutant astrocytes were stained using anti-ShmC (1:400) followed by 
Alexa Fluor 488 secondary antibody. Cells were washed in PBS and analysed using 
the FACScan flow cytometer (Becton Dickinson). FACS data were analysed using 
FLowJo Software (TreeStar). 

Neurosphere assay. IDH1(R123H)-expressing astrocytes and parental controls 
were grown in media permissive of neural stem cell growth as previously 
described’. Briefly, immortalized human astrocyte (IHA) cells stably expressing 
wild-type or R132H mutant IDH1 at passage 15 were seeded in 6-well plates at 
200,000 cells per well. The next day, proliferation medium (DMEM plus 10% FCS) 
was replaced with neural stem cell medium made from serum-free DMEM supple- 
mented with B27 and N2 supplements (all from Invitrogen), bFGF, EGF and 
PDGFAA (all at 20ngml', all from PeproTech). Medium was replaced every 
2-3 days. Neurospheres were quantified using microscopy. Experiments were per- 
formed in triplicate. 
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Transformation by the (R)-enantiomer of 
2-hydroxyglutarate linked to EGLN activation 
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The identification of succinate dehydrogenase (SDH), fumarate 
hydratase (FH) and isocitrate dehydrogenase (IDH) mutations in 
human cancers has rekindled the idea that altered cellular metabolism 
can transform cells. Inactivating SDH and FH mutations cause the 
accumulation of succinate and fumarate, respectively, which can 
inhibit 2-oxoglutarate (2-OG)-dependent enzymes, including the 
EGLN prolyl 4-hydroxylases that mark the hypoxia inducible factor 
(HIF) transcription factor for polyubiquitylation and proteasomal 
degradation’. Inappropriate HIF activation is suspected of con- 
tributing to the pathogenesis of SDH-defective and FH-defective 
tumours but can suppress tumour growth in some other contexts. 
IDH1 and IDH2, which catalyse the interconversion of isocitrate 
and 2-OG, are frequently mutated in human brain tumours and 
leukaemias. The resulting mutants have the neomorphic ability to 
convert 2-OG to the (R)-enantiomer of 2-hydroxyglutarate ((R)- 
2HG)”*. Here we show that (R)-2HG, but not (S)-2HG, stimulates 
EGLN activity, leading to diminished HIF levels, which enhances 
the proliferation and soft agar growth of human astrocytes. These 
findings define an enantiomer-specific mechanism by which the 
(R)-2HG that accumulates in IDH mutant brain tumours promotes 
transformation and provide a justification for exploring EGLN 
inhibition as a potential treatment strategy. 

To study the role of IDH mutations in brain tumours, we stably 
infected immortalized human astrocytes with retroviral vectors encod- 
ing haemagglutinin (HA)-tagged versions of wild-type IDH1, a tumour- 
derived mutant (IDH1 R132H)*”, or an IDH1 R132H variant in which 
three conserved aspartic acid residues within the IDH1 catalytic 
domain were replaced with asparagines (R132H/3DN) (Fig. la and 
Supplementary Fig. 1). As expected, (R)-2HG levels, but not (S)- 
2HG levels, were markedly increased in the cells producing IDH1 
R132H but not in cells producing the R132H/3DN variant (Fig. 1b 
and Supplementary Fig. 2). In multiple independent experiments the 
IDH1 R132H cells acquired a proliferative advantage relative to cells 
producing the other versions of IDH1 beginning around passage 14, 
manifested as increased proliferation at confluence (Fig. 1c) and the 
ability to form macroscopic colonies in soft agar (Fig. 1d, e). 

Consistent with recent reports, we found that both (R)-2HG and 
(S)-2HG inhibit a number of 2-OG-dependent enzymes in vitro*®, 
including the collagen prolyl 4-hydroxylases, the TET1 and TET2 
methyl cytosine hydroxylases, the HIF asparaginyl hydroxylase FIH1 
(also known as HIF1AN) and the JMJD2D (also known as KDM4D) 
histone demethylase, with (S)-2HG being a more potent inhibitor than 
(R)-2HG (Supplementary Fig. 3 and data not shown). (S)-2HG was 


also a micromolar to low millimolar inhibitor of the three mammalian 
HIF prolyl 4-hydroxylases (EGLN1, EGLN2 and EGLN3) under 
standard assay conditions, which included 10 UM 2-OG (Supplemen- 
tary Fig. 4). In contrast, (R)-2HG was not an effective EGLN inhibitor 
(half-maximum inhibitory concentration (IC59) values >5mM; 
Supplementary Fig. 4). Moreover, we discovered unexpectedly that 
(R)-2HG, but not (S)-2HG, promoted EGLN1 and EGLN2 activity, 
and to a lesser extent EGLN3 activity, at tumour-relevant concentra- 
tions (low mM) in reactions that lacked exogenous 2-OG (Fig. 2a, c, d 
and Supplementary Fig. 5a, b). This was specific because (R)-2HG did 
not promote collagen prolyl 4-hydroxylase activity (Fig. 2b) or 
JMJD2D histone demethylase activity (data not shown). Similar results 
were obtained with EGLN1 purified from either insect cells or 
Escherichia coli and with a heat-inactivated HIF polypeptide substrate 
(Supplementary Fig. 6), making it unlikely that the ability of (R)-2HG 
to promote EGLN activity required a contaminating enzyme. 

To determine how (R)-2HG might promote EGLN activity, we 
monitored the EGLN1 prolyl 4-hydroxylase reaction using liquid 
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Figure 1 | Oncogenic properties of IDH1 R132H. a,b, Anti-HA immunoblot 
(a) and LC-MS analysis (b) of immortalized human astrocytes (passage 4) 
infected with retroviruses encoding HA-tagged versions of the indicated IDH1 
variants. c, d, In vitro proliferation under standard culture conditions (c) or in 
soft agar (d) (passage 23). Scale bars, 0.5 mm. e, Number of macroscopic soft 
agar colonies in d (*P < 0.01, **P < 0.005). Error bars show standard deviation 
(s.d.); n = 3. 
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Figure 2 | (R)-2HG can serve as an EGLN cosubstrate. a, b, In vitro prolyl 
4-hydroxylation assays conducted with recombinant EGLN proteins (a) and 
collagen P4H-I (b) in the presence of the indicated amounts of 2-OG or 2HG. 
4-Hyp/Pro, ratio of 4-hydroxyproline to proline. L- [2,3,4,5-°H] proline-labelled 
HIP- 1o oxygen-dependent degradation domain (ODDD) (a) and ['*C] proline- 
labelled protocollagen (b) were used as substrates. Enzymes were produced in 
insect cells using baculoviruses and affinity purified. Error bars, s.d.; n = 3-4. 
c, d, Ky, (in uM) and maximum enzyme velocity (Vina percentage of that 
obtained with 2-OG) values for (R)-2HG for EGLN family members. K,,, and 
Vmax Values for 2-OG”! are included for comparison. d.p.m., disintegrations per 
minute. N.D., not defined. e, LC-MS analysis of succinate, 2-OG and (R)-2HG 
from enzymatic reactions with EGLN1, HIF-1a% ODDD polypeptide and either 
5 mM (R)-2HG (red) or 80 uM 2-OG (black) as cofactors. Numbers next to 
each peak indicate elution times (bottom) and peak areas (top). No peaks above 
background were detected in samples in which 2-OG and (R)-2HG were both 
omitted (data not shown). f, Model of (R)-2HG (green) and (S)-2HG (cyan) 
bound to the active site of EGLN1. N-oxalylglycine (magneta) bound in the 
original structure” is shown for comparison. The active-site water molecule, 
which has been shown to be the O2-binding site”, is shown in red and the 
peptide substrate in light blue. Hydrogen bonds are indicated by dash lines. 


chromatography—mass spectrometry (LC-MS). 2-OG and succinate 
were detected when catalytically active EGLN1 was incubated with 
5mM (R)-2HG and a recombinant HIF-1o polypeptide, suggesting 
that EGLNI1 can oxidize (R)-2HG to 2-OG, which is then decarboxylated 
to succinate during the hydroxylation reaction (Fig. 2e and Sup- 
plementary Fig. 5c). In support of this model, '*C-labelled succinate 
was generated in EGLN hydroxylation assays that contained uniformly 
labelled '°C-(R)-2HG (Supplementary Fig. 5d). 

Consistent with the idea that (R)-2HG can substitute for 2-OG as a 
cosubstrate, the addition of increasing amounts of (R)-2HG to EGLN1 
assays containing 10 tM 2-oxo-[1-'*C] glutarate progressively decreased 
the release of '*CO without decreasing the prolyl hydroxylation of 
HIF-10, even at concentrations as high as 100 mM (Supplementary 
Fig. 7). Modelling of 2HG bound to the active site of EGLN1 predicts 
that binding of the (S)-enantiomer, but not the (R)-enantiomer, would 
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prevent the subsequent recruitment of oxygen to the active site, 
perhaps accounting for the qualitatively different effects of the two 
enantiomers on EGLN activity (Fig. 2f and Supplementary Fig. 5e). 

To investigate whether these findings were relevant in vivo we 
examined HIF levels in IDH1 mutant cells. In keeping with our bio- 
chemical results, HIF-1o and HIF-2« protein levels were reproducibly 
lower in mid-passage (passage 9-15) human astrocytes producing 
IDH1 R132H relative to control astrocytes (Fig. 3a), due at least partly 
to increased HIF-la and HIF-2«% hydroxylation and diminished 
protein stability (Supplementary Figs 8 and 9), and were associated 
with lower levels of the HIF-responsive messenger RNAs encoding 
VEGF, GLUTI1 and PDK1 (Supplementary Fig. 10a). Oxygen con- 
sumption and reactive oxygen species production, which can also 
affect HIF levels and the HIF response, were not measurably altered 
in the IDH1 mutant cells (Supplementary Fig. 11). IDH1(R132H)- 
expressing cells were relatively resistant to the 2-OG competitive 
antagonist dimethyloxalylglycine (DMOG), but not to the iron 
chelator deferoxamine (DFO) (Fig. 3b), consistent with (R)-2HG 
acting as a 2-OG agonist in intact cells. In later passages (more than 
passage 20), HIF levels in IDH1(R132H)-expressing immortalized 
human astrocytes began to normalize despite persistent production 
of the exogenous IDH1 protein and (R)-2HG (Supplementary Fig. 12), 
possibly due to adaptive HIF-responsive feedback loops such as those 
involving EGLN3 and microRNA miR-155 (refs 1, 7). These cells, 
however, retained the ability to form colonies in soft agar (data not 
shown) and remained addicted to EGLN activity (see later). 

Similarly, the induction of HIF-10 and HIF-responsive mRNAs by 
hypoxia was diminished in two independent cell lines derived from 
two IDH1 R132H, 1p/19q-codeleted oligodendrogliomas compared to 
a control IDH1 wild-type, 1p/19q-codeleted oligodendroglioma line 
that had been generated in a similar fashion (Fig. 3c and Supplemen- 
tary Fig. 10b). Although these three cell lines have similar growth 
kinetics in vitro (data not shown) they are not isogenic. We therefore 
also tested two HCT116 colorectal cancer cell sublines wherein the 
R132H mutation was introduced into the endogenous IDH1 locus by 
homologous recombination. These sublines also showed a diminished 
HIF response compared to wild-type cells unless EGLN was pharma- 
cologically (DFO) or genetically (short hairpin (sh)RNA) inactivated 
(Fig. 3d, e and Supplementary Fig. 13). 

Finally, we asked whether IDH mutational status influenced HIF 
activity in primary patient astrocytoma samples using the TCGA 
expression data set® and a previously defined HIF-responsive gene 
expression signature’. The HIF signature was diminished in proneural 
tumours—which is the subtype most often associated with IDH muta- 
tions*—relative to other gene-expression-defined subtypes of brain 
cancer (Supplementary Fig. 14a, b) and, notably, was diminished in 
IDH mutant proneural tumours relative to wild-type proneural 
tumours (Fig. 3f and Supplementary Fig. 14c). Similar results were 
obtained with other previously published HIF gene sets and with a 
manually curated data set (Supplementary Fig. 14a, d). 

Downregulating HIF-10 with three independent shRNAs promoted 
soft agar growth by immortalized human astrocytes after approxi- 
mately 15 passages (Fig. 4a, b and Supplementary Fig. 15), as did 
overproduction of wild-type, but not catalytic-defective, EGLN1 
(Fig. 4c, d and Supplementary Fig. 16). Conversely, downregulation 
of EGLN1 with multiple shRNAs inhibited the proliferation of late 
passage IDH1 R132H cells (Fig. 4e, f and Supplementary Fig. 17) 
unless HIF-1a was concurrently ablated (Fig. 4g, h and Supplemen- 
tary Fig. 18). 

Collectively, these data suggest that EGLN activation by (R)-2HG, 
and subsequent downregulation of HIF-10, contributes to the patho- 
genesis of IDH mutant gliomas. Our data do not, however, exclude the 
possibility that (R)-2HG has additional targets, including TET2 and 
JmjC-containing histone demethylases, which contribute to its ability 
to transform cells**®. Indeed, we found that downregulation of TET2 
in human astrocytes also promotes soft agar growth (Supplementary 
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Figure 3 | HIF activity is diminished in IDH mutant cells. a-d, Immunoblot 
analysis of immortalized human astrocytes (passage 10) 

(a, b), oligodendroglioma cells (c) and HCT116 colorectal cells (d) expressing 
the indicated IDH1 variants grown under 21% (N) or 7.5% (H) oxygen for 24h 
before lysis or under 21% oxygen in the presence of DFO (D). Names of cell 
lines are indicated in brackets. In b, cells grown under 21% oxygen were treated 


Fig. 19). Nonetheless, our finding that (R)-2HG and (S)-2HG are 
qualitatively different with respect to EGLN might explain the appar- 
ent selection for (R)-2HG in adult tumours, despite the fact that (S)- 
2HG is a more potent inhibitor of most of the 2-OG-dependent 
enzymes tested so far. It should be noted, however, that (S)-2HG 
has been linked to neurological abnormalities and brain tumours in 
children and young adults with germline (S)-2-HG dehydrogenase 
mutations'*’. The pathogenesis of (R)-2HG-driven tumours linked 
to somatic IDH mutations in adults and (S)-2HG-driven tumours 
linked to inborn errors of metabolism conceivably differ, with the latter 
possibly reflecting the perturbation of one or more neurodevelopmental 
programs during embryogenesis. 

Nor are our data incompatible with the finding that HIF-1o protein 
levels are increased in IDH mutant tumours relative to normal brain”. 
Our data simply suggest that (R)-2HG quantitatively shifts the dose- 
response linking HIF activation to hypoxia, leading to a blunted HIF 
response for a given level of hypoxia. In support of this idea, HIF 
elevation in IDH mutant tumours is usually confined to areas of 
necrosis and presumed severe hypoxia”. 

Although HIF is typically viewed as an oncoprotein it can behave as 
a tumour suppressor in embryonic stem cells’*'®, leukaemic cells'” and 
brain tumour cells’*'’. For example, HIF-10 has been shown to score as 
an oncoprotein when transformed murine astrocytes are grown 
subcutaneously but as a tumour suppressor when such cells are grown 
orthotopically’’. This finding, together with our results, raises the 
possibility that pharmacological inhibition of EGLN activity (and the 
resulting increase in HIF activity) would impair the growth of IDH1 
mutant tumours. A caveat, however, is that IDH mutant tumours tend 
to be relatively indolent”®. It is possible that low levels of HIF-1a, 
although promoting some aspects of transformation, simultaneously 
suppress other hallmarks of cancer required for aggressive behaviour 
(such as angiogenesis). 


VEGF EGLN3 CAIX 


with increasing amounts of DMOG or DFO for 16h before lysis. e, Quantitative 
real-time PCR analysis of cells in d under normoxic conditions. Error bars, s.d. 
n= 3. f, Heat map depicting expression of HIF target genes (blue, lower 
expression; red, higher expression) in proneural tumours clustered based on 
IDH status (in the horizontal bar above the matrix, red indicates mutant; 
green, wild type). 


The published 2-OG Michaelis constant (K,,) values for the EGLN 
family members are below the estimated intracellular 2-OG concen- 
tration*’”, suggesting that 2-OG should not be limiting for EGLN 
activity and that EGLN activity would not be enhanced further by 
(R)-2HG. A caveat is that a considerable amount of intracellular 
2-OG appears to be sequestered in mitochondria and might also be 
bound by other 2-OG-dependent enzymes”’. Moreover, these 2-OG 
Kw values were determined under idealized conditions with purified 
enzymes and substrates in the absence of endogenous inhibitors such 
as reactive oxygen species, nitric oxide and 2-OG competitive mole- 
cules such as succinate and fumarate’. Studies in model organisms 
suggest that many metabolic enzymes are saturated in vivo with a 
mixture of substrate and competitive inhibitors and thus are sensitive 
to changes in substrate concentrations that are far above their nominal 
K,, values”***. We confirmed that succinate and fumarate increase the 
2-OG requirement for the EGLN reaction and that their inhibitory 
activity, like that of DMOG (Fig. 3b) and its active derivative 2-oxalyl 
glycine, is blunted in the presence of (R)-2HG (Supplementary Fig. 20). 

We have not yet formally proven that (R)-2HG is sufficient to 
downregulate HIF in intact cells. It is possible, for example, that addi- 
tional metabolic changes in IDH mutant cells sensitize them to the HIF 
modulatory effects of (R)-2HG”. Another, potentially related, obser- 
vation is that both downregulation of HIF and transformation by 
mutant IDH in immortalized human astrocytes, although highly 
reproducible, was only noted after multiple passages. HIF activates a 
number of genes, including genes that participate in feedback regu- 
lation of the HIF response and genes that modify chromatin structure’. 
It is possible that modulation of the HIF response over time, perhaps in 
conjunction with alterations in other enzymes affected by 2-HG, leads 
to epigenetic changes that ultimately are responsible for transforma- 
tion. If so, it will be important to determine the degree to which such 
changes are reversible. 
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Figure 4 | Decreased HIF activity contributes to transformation by mutant 
IDH. a, c, Soft agar colony formation by human astrocytes after stable infection 
with lentiviruses encoding the indicated HIF-1a shRNAs or scrambled (scr) 
shRNA (a) or lentiviruses encoding EGLN1, EGLN1 (P317R), or venus 
fluorescent protein (c; RNAi Consortium shRNA clone identifiers are indicated 
in brackets). Scale bars, 0.5 mm. b, d, Number of macroscopic soft agar colonies 
in a and ¢, respectively. *P < 0.05, **P < 0.005. Error bars, s.d. 1 = 3. 

e, f, Proliferation (e) and immunoblot analysis (f) of human astrocytes 
expressing wild-type or R132H IDHI1 and an shRNA against EGLN1 (10578) 
(or scrambled control). g, h, Proliferation of human astrocytes expressing IDH1 
R132H and shRNAs against HIF-1% (0819), EGLN1 (10578), or both. 


METHODS SUMMARY 


NHAs immortalized with E6/E7/hTERT have been described elsewhere” and were 
subsequently infected with IDH retroviruses. IDH1 R132H HCT116 colorectal 
cancer cells were made by homologous recombination as previously described’. 
Anaplastic oligodendroglioma cell lines were derived from resection material 
obtained at surgeries performed at the Brigham and Women’s Hospital. 
Lentiviral shRNA vectors were from the Broad Institute. Immunoblots were per- 
formed with anti- HIF-1 monoclonal antibody (BD Transduction Laboratories), 
anti-HIF-2% polyclonal antibody (NB 100-122, Novus), anti-Flag monoclonal 
antibody (M2, Sigma-Aldrich), monoclonal anti-HA (HA-11, Covance Research 
Product), anti-EGLN1 monoclonal antibody (D31E11, Cell Signaling), anti- 
GLUT1 polyclonal antibody (NB300-666, Novus Biologicals), monoclonal anti- 
tubulin (B-512, Sigma-Aldrich) or anti-vinculin monoclonal antibody (V9131, 
Sigma-Aldrich). Metabolite levels were determined by LC-MS as previously 
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described’. Enzyme activity assays were performed with affinity-purified human 
EGINI1, 2 and 3, collagen P4H-I and FIH, and murine Tet] and Tet2 produced in 
insect cells using baculoviral vectors. (R)-2HG and (S)-2HG were from Sigma. (R)- 
2HG was free of contaminating 2-OG as determined by LC-MS under conditions 
that could detect 0.37 1M exogenous 2-OG in 5 mM (R)-2HG (data not shown). 
Human brain tumour mRNA expression data were obtained from The Cancer 
Genome Atlas and processed as described*’*. Cell proliferation assays were 
performed using XTT assays (Cell Proliferation Kit Il, Roche) according to the 
manufacturer’s instructions or by direct counting of viable cells using an 
automated cell counter (Invitrogen). For soft agar assays ~8,000 cells were sus- 
pended in a top layer of 0.4% soft agar (SeaPlaque Agarose, BMA products) and 
plated on a bottom layer of 1% soft agar containing DMEM supplemented with 
10% fetal bovine serum. For real-time qPCR analysis total RNAs were first 
extracted with Trizol reagent (Invitrogen). cDNA synthesis and PCR amplification 
were performed with Superscript One-Step RT-PCR (Invitrogen). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell lines. The NHA cell line immortalized with E6/E7/hTERT has been described 
elsewhere*® and was maintained in DMEM containing 10% fetal bovine serum 
(FBS) and 1% penicillin/streptomycin in the presence of 10% CO, at 37°C. 
Following retroviral infection, cells were maintained in the presence of hygromycin 
(100 pg ml~'). 

Introduction of the IDH1 R132H mutation into HCT116 colon cancer cell lines 
by homologous recombination was as previously described”’. Briefly, targeting 
constructs were designed using the pSEPT rAAV shuttle vector’'. Homology 
arms for the targeting vector were PCR amplified from HCT116 genomic DNA 
using Platinum Taq HiFi polymerase (Invitrogen). The R132H hotspot mutation 
was introduced in the targeting construct using the Quickchange II site-directed 
mutagenesis kit (Stratagene). An infectious rAAV stock harbouring the targeting 
sequence was generated and applied to parental HCT116 cells as previously 
described*’, and clones were selected in 0.5mgml' geneticin (Invitrogen). 
Excision of the selectable element was achieved with an adenovirus encoding 
Cre recombinase (Vector Biolabs). Genomic DNA and total RNA were isolated 
from cells using QLAmp DNA Blood Kit and RNeasy kit (Qiagen). First strand 
cDNA was synthesized using iScript cDNA Synthesis Kit (BioRad). Successful 
homologous recombination and Cre-mediated excision were verified using 
PCR-based assays and by direct sequencing of genomic DNA and cDNA. 

Anaplastic oligodendroglioma cell lines expressing wild-type IDH1 (BT260) or 

IDH1 R132H (BT237 and BT138) were derived from surgical resection material 
acquired from patients undergoing surgery at the Brigham and Women’s Hospital 
on an Institutional Review Board approved protocol. Briefly, tumour resection 
samples were mechanically dissociated and tumospheres were established and 
propagated in Human NeuroCult NS-A Basal media (StemCell Technologies) 
supplemented with EGF, FGFb and heparin sulphate. All lines were obtained from 
the DF/BWCC Living Tissue Bank and confirmed to be derived from recurrent 
and progressive anaplastic oligodendrogliomas, World Health Organization grade 
IIL, at the time of cell-line isolation, and to have chromosome 1p/19q codeletion. 
The presence of IDH R132H mutation was confirmed by mutant specific antibody 
staining via immunohistochemistry and direct DNA sequencing. 
Vectors. Human IDH1 cDNAs (wild type and R132H) were subcloned as BamHI- 
EcoRI fragments into pBabe-HA-hygro. Aspartic acid residues 273, 275 and 279 
were mutated to asparagines (N) by site-directed mutagenesis and confirmed by 
DNA sequencing. EGLN1, EGLN2 and EGLN3 cDNAs were subcloned into 
pLenti6-Flag expression vector after restriction with XbaI and EcoRI. pLenti6- 
Flag-EGLN1 P317R*’ was prepared by QuikChange Mutagenesis (Stratagene) 
using pLenti6-Flag EGLN1 as a template and the following oligonucleotides: 
5'-GTACGTCATGTTGATAATCGAAATGGAGATGGAAGATCTG-3’ and 5’- 
CACATGTTCCATCTCCATTTCGATTATCAACATGACGTAC-3’. The entire 
EGLN1 coding region was sequenced to verify its authenticity. 

Lentiviral (pLKO.1) HIF-la shRNA vectors (TRCN0000003810, target 
sequence: 5'-GTGATGAAAGAATTACCGAAT-3’; TRCN0000010819, target 
sequence: 5'-TGCTCTTTGTGGTTGGATCTA-3’; TRCN0000003809, target 
sequence: 5'-CCAGTTATGATTGTGAAGTTA-3’), EGLN1 (PHD2) shRNA 
vectors (TRCN0000001042, target sequence: 5’-CTGTTATCTAGCTGAGTT 
CAT-3’; TRCN0000001043, target sequence: 5’-GACGACCTGATACGCCACT 
GT-3'; TRCN00000010578, target sequence: 5'-TGCACGACACCGGGAAGT 
TCA-3’) and TET2 shRNA vectors (TRCN0000122172 (122), target sequence: 
5'-GCGTTTATCCAGAATTAGCAA-3'; TRCN0000144344 (145), target sequence: 
5'-CCTTATAGTCAGACCATGAAA-3’) were obtained from the Broad Institute 
TRC shRNA library. pLKO.1 shRNA with target sequence 5’-GCAAG 
CTGACCCTGAAGTTCAT-3’ was used as negative control shRNA. 
Immunoblot analysis. Cells extracts were prepared with 1X lysis buffer (50 mM 
Tris (pH 8.0), 120 mM NaCl, 0.5% NP-40) supplemented with a protease inhibitor 
cocktail (Complete, Roche Applied Science), resolved on 10% SDS-PAGE gels and 
transferred to nitrocellulose membranes (Bio-Rad). Membranes were blocked in 
TBS with 5% non-fat milk and probed with anti-HIF-1a monoclonal antibody 
(BD Transduction Laboratories), anti- HIF-20 polyclonal antibody (NB 100-122, 
Novus), anti-Flag monoclonal antibody (M2, Sigma-Aldrich), mouse monoclonal 
anti-HA (HA-11, Covance Research Product), anti-EGLN1(PHD2) monoclonal 
antibody (D31E11, Cell Signaling), anti-GLUT1 polyclonal antibody (NB300-666, 
Novus Biologicals), mouse monoclonal anti-tubulin (B-512, Sigma-Aldrich) or 
anti-vinculin monoclonal antibody (Sigma-Aldrich). Bound proteins were 
detected with horseradish-peroxidase-conjugated secondary antibodies (Pierce) 
and Immobilon western chemiluminescent horseradish peroxidase substrate 
(Millipore). 

LC-MS. Metabolite levels in samples were determined by negative mode electro- 
spray LC-MS as previously described’. Briefly, metabolites were extracted from 
exponentially growing cells using 80% aqueous methanol (—80 °C) and were 
profiled by LC-MS. (R)-2HG, 2-oxolutarate and succinate were quantified by 


LC-MS in negative mode using multiple reaction monitoring (MRM) on a 
Quattro micro triple quadruple mass spectrometer (Waters). Samples were diluted 
with equal amounts of 25% acetonitrile and 10 pl aliquots were analysed in trip- 
licate. The HPLC column (Luna NH2, 3 tm, 2.0 X 100 mm, Phenomenex) was 
operated isocratically with 130 mM ammonium acetate pH 5.0 in 37% acetonitrile/ 
water. The MRM transitions were: 117 > 73, 117 > 93 (succinate), 145 > 57, 
145 > 101 (2-OG) and 147 > 85, 147 > 129 ((R)-2HG), 0.2 dwell time for all 
transitions. Calibration curves were set up with Quant Lynx, using standards 
dissolved in reaction buffer. 

Enzyme activity assays. (R)-2HG (H8378) and (S)-2HG (S765015) were from 
Sigma. (R)-2HG was free of contaminating 2-OG as determined by LC-MS under 
conditions that could detect 0.37 [1M exogenous 2-OG in 5 mM (R)-2HG (data not 
shown). Human EGLNI, 2 and 3, collagen P4H-I and FIH, and murine Tet and 
Tet2 were produced in insect cells and purified as described earlier***’. The 
plasmids to generate the baculoviruses coding for Tetl and 2 were a gift from Y. 
Zhang (University of North Carolina). IC;9 values for (R)-2HG and (S)-2HG were 
determined based on the hydroxylation-coupled stoichiometric release of '*CO, 
from 2-oxo-[1-'*C]glutarate using synthetic peptides or double stranded oligonu- 
cleotides representing the natural targets of the studied enzymes as substrates. 
These were DLDLEMLAPYIPMDDDFQL (DLD19) for EGLN1, 2 and 3, (PPG); 
for collagen P4H-I, DESGLPQLTSY DCEVNAPIQGSRNLLQGEELLRAL for FIH 
and 5'-CTATACCTCCTCAACTT(mC)GATCACCGTCTCCGGCG-3’ for Tet1 
and 2. The K; values for (S)-2HG for EGLN1, 2 and 3 were determined by adding 
(S)-2HG in four constant concentrations while varying the concentration of 
2-oxo-[1-'*C]glutarate. 

To study whether (R)-2HG, which failed to efficiently inhibit EGLN activity, 
could promote EGLN activity by acting as a cofactor in the place of 2-OG, we 
determined the amount of 4-hydroxy[*H]proline formed by a specific radioche- 
mical procedure** using a L- [2,3,4,5-°H]proline-labelled HIF-1a ODDD asa sub- 
strate. Collagen P4H-I and a [‘*C]-proline-labelled protocollagen® substrate were 
used as controls (detecting 4-hydroxy['*C]proline), and 2-OG (non-labelled) and 
(S)-2HG were assayed for comparison. The K,,, values of EGLN1 and EGLN2 were 
determined by adding increasing amounts of (R)-2HG while the concentration of 
the substrate and other cofactors were kept constant. 

Generation of a recombinant HIF-1a substrate. The HIF-1« ODDD substrate, 
spanning residues 356-603 of human HIF-10, was produced in a BL21(DE3) E. 
coli strain (Novagen) in the presence of L-[2,3,4,5-°H] proline (75Cimmol |, 
PerkinElmer Life Sciences) and affinity purified in a chelating Sepharose column 
charged with Ni?* (ProBond, Invitrogen) exploiting a C-terminal His tag". 
Concentration of the purified substrate was measured by RotiQuant (Carl Roth 
GmbH) and it was used at K,,, concentrations for the distinct EGLN enzymes”. 
Modelling. EGLN1 active site structure (PDB accession 3HQR”) with 
N-oxalylglycine (NOG, magneta), Mg”* and a peptide substrate (light blue), 
was used to model (R)-2HG (green) and (S)-2HG (cyan) into the active site. 
Gene expression profiling and gene-set enrichment analysis. Expression data 
from human brain tumour samples were obtained from The Cancer Genome Atlas 
(http://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp) and processed as described*”*. 
In short, 200 expression profiles from glioblastoma multiforme and two non- 
neoplastic brain samples were generated using three platforms (Agilent 244K, 
Affymetrix HT-HG-U133A, Affymetrix HuEx), preprocessed using gene centric 
probe sets and three expression values were integrated through factor analysis. 
After consensus clustering using 1,740 variably expressed genes, profiles with a 
negative silhouette metric were identified and removed from the data set, leaving 
expression profiles from 173 glioblastoma tumour samples. IDH1 mutation status 
was established for 116 out of the 173 samples. 

Eleven gene sets representing response to induced hypoxia were reported in 
supplementary table 6 from ref. 41. One gene set was from figure 4C in ref. 9. A 
final set was assembled by one of us (W.G.K.) based on a literature review of 
genes upregulated by HIF in a wide variety of cell types (CA9, EGLN1, EGLN3, 
SLC2A1, BNIP3, ADM, VEGF, PDK1, LOX, PLOD1, CXCR4, P4HAI, 
ANKRD37). Single-sample gene-set enrichment analysis (GSEA) was applied 
as reported previously*. Briefly, genes were ranked by their expression values. 
The empirical cumulative distribution functions (ECDF) of both the genes in the 
signature as well as the remaining genes were calculated. An enrichment score 
was obtained by a sum of the difference between a weighted ECDF of the genes in 
the signature and the ECDF of the remaining genes. This calculation was 
repeated for all signatures and samples. Z-score transformation was applied to 
be able to make scores from different gene sets comparable. A positive score 
indicates gene set activation. A negative value does not indicate inactivation, but 
rather a lack of effect. 

Cell proliferation assays. Cells were plated in 96-well plates (~700 cells per well) 
with a media change every 3 days. The number of viable cells per well at each 
time point was measured using an XTT assay (Cell Proliferation Kit II, Roche) 


©2012 Macmillan Publishers Limited. All rights reserved 


according to the manufacturer’s instructions. Spectrophotometrical absorbance at 
450 nm was measured 5-6 h after adding the XTT labelling reagent/electron coup- 
ling reagent using a microtiter plate reader (Perkin Elmer Life and Analytical 
Science). For direct cell counting, cells were plated in p60 dishes (~50,000 cells 
per dish) with a media change every 3 days. The number of viable cells at each time 
point was measured after trypan blue staining by using an automated cell counter 
(Invitrogen) according to the manufacturer’s instructions. 

Soft agar colony formation assay. Approximately 8,000 cells were suspended in 
a top layer of 0.4% soft agar (SeaPlaque Agarose, BMA products) and plated on 
a bottom layer of 1% soft agar containing complete DMEM supplemented 
with 10% FBS in 6-well plates. After 3-4 weeks, colonies were stained with 0.1% 
iodonitrotetrazolium chloride (Sigma-Aldrich). 

Real-time qPCR analysis. Total RNAs were extracted with Trizol reagent 
(Invitrogen). cDNA synthesis and PCR amplification were performed with 
Superscript One-Step RT-PCR (Invitrogen) with 2 jig total RNA. EGLN3 
cDNA was amplified with sense primer (5'-GCGTCTCCAAGCGACA-3’) and 
antisense primer (5'-GTCTTCAGTGAGGGCAGA-3’). VEGF cDNA was amp- 
lified with sense primer 5'-CGAAACCATGAACTTTCTGC-3’) and antisense 
primer 5’-CCTGAGTGGGCACACACTCC-3’). HIFIA cDNA was amplified 
with sense primer (5’-TATTGCACTGCACAGGCCACATTC-3’) and antisense 
primer (5'-TGATGGGTGAGGAATGGGTTCACA-3’). HIF2A was amplified 
with sense primer (5'-ACAAGCTCCTCTCCTCAGTTTGCT-3’) and antisense 
primer (5’-ACCCTCCAAGGCTTTCAGGTACAA-3’). GLUT1 CAIX cDNA was 
amplified with sense primer (5’-TGGAAGAAATCGCTGAGGAAGGCT-3’) and 
antisense primer (5'-AGCACTCAGCATCACTGTCTGGTT-3’). PDK1 cDNA 
was amplified with sense primer (5'-ATGATGTCATTCCCACAATGGCCC-3’) 
and antisense primer (5’-TGAACATTCTGGCTGGTGACAGGA-3’). As a control, 
f-actin cDNA was amplified with sense primer (5’-ACCAACTGGGACGACA 
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TGGAGAAA-3') and antisense primer (5’-TAGCACAGCCTGGATAGCAA 
CGTA-3’) 
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Evolution after tumour spread 


A genetic study of brain cancers in mice and humans reveals distinct mutations in primary tumours and their metastases, 


suggesting that the two disease ‘compartments’ may require different treatments. 


STEVEN C. CLIFFORD 


he spread of a primary tumour to 

secondary sites in the body is a key step 

in the development of many cancers, 
and treatment of these secondary metastatic 
tumours represents one of the foremost chal- 
lenges in oncology. In an article published 
on Nature’s website today, Wu et al.’ describe 
two new mouse strains that serve as models 
of metastasis in the childhood brain cancer 
medulloblastoma. In the mice, primary and 
metastatic tumours seem to occupy two 
genetically distinct ‘compartments, which arise 
from divergent DNA-sequence mutations that 
occur after metastasis. The authors also detect 
similar differences in human medulloblastoma 
tumours — a finding that may influence the 
development of anticancer therapies. 

Wuet al. used an experimental system called 
Sleeping Beauty mutagenesis” to introduce 
random genetic mutations into cerebellar 
progenitor cells in the developing brains of 
two strains of mice. These two new strains 
were derived by breeding the existing Tp53””" 
and Ptch’’ strains, which are predisposed to 
brain tumours™, with a strain that expresses 
the Sleeping Beauty mutagen in cerebellar 
progenitors. This system leaves a unique 
genetic ‘footprint’ at each mutation site, which 
allows mutated genes to be identified by DNA 
sequencing. Such mutagenesis experiments are 
used to identify genes in which mutations fre- 
quently arise, because the likelihood of these 
being involved in tumour development is 
reasoned to be above average”. 

As in mouse models of other cancer types”, 
Wu and colleagues’ Sleeping Beauty mutagen- 
esis accelerated the development of medullo- 
blastoma in both mouse strains. The authors 
identified a range of new and established cancer- 
related genes that had mutations, including 
some that have previously been implicated in 
medulloblastoma. They also observed that, 
following mutagenesis, mice of both strains 
developed metastases around a type of brain 
tissue called the leptomeninges, in patterns that 
are reminiscent of metastatic human medullo- 
blastoma’. The two mouse models thus pro- 
vided an opportunity to track mutations 
present in the primary and metastatic disease, 
and to investigate their genetic provenance. 
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Figure 1 | A bi-compartmental genetic model of cancer metastasis. By analysing the tumours from 
two strains of mice that model the brain cancer medulloblastoma, Wu et al.' found differences in the 
DNA-sequence mutations present in primary and metastatic tumours. They propose that rare cells in the 
primary tumour that are capable of metastasizing disperse to other sites in the brain, where they form 
metastases. The cells of the primary tumour and the metastases then continue to accumulate mutations, 


generating two distinct genetic compartments. 


Wu et al. found that there were, in general, 
only a few mutations common to primary and 
metastatic tumours from the same mouse, 
but that the mutations in different metasta- 
ses from the same mouse tended to be more 
similar to each other. Moreover, certain muta- 
tions observed in metastases were detected at 
only low levels within the primary tumour, 
and some mutations were unique to one or 
the other tumour type. The authors conclude 
that their findings are consistent with a model 
in which metastases originate from rare cells 
in the primary tumour, and that, following 
metastasis, additional mutations accumulate 
independently — both in the primary tumour 
(post-dispersion events) and in metastases 
(post-metastasis events) (Fig. 1). 

Turning our attention away from mice, 
an obvious question is whether primary and 
metastatic tumours in the human disease also 
show ‘bi-compartmental genetics. Approxi- 
mately 30% of patients with medulloblastoma 
already have metastases when they are first 
diagnosed, and this is associated with a poor 
prognosis’. However, few previous studies 
have compared the biology of human primary 
tumours with their associated metastases, 
mainly because metastases are not routinely 
biopsied. Despite the limited sample availabil- 
ity, Wuet al.’ showinitial evidence of differing 
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genetics in primary and metastatic tumours 
from seven human patients. 

Further investigation is required to estab- 
lish whether the authors’ findings are broadly 
relevant to human medulloblastoma. The 
human disease exhibits> more complex 
patterns of metastases than are observed in 
mice, and is classified into four molecular sub- 
groups (WNT, SHH, Group 3 and Group 4), 
which each display distinct biological and 
clinical characteristics’. The Ptch”” mice used 
by Wuetal.’ develop SHH-associated medullo- 
blastomas’; similar mutagenesis-driven 
approaches using existing mouse models 
of other medulloblastoma disease groups, such 
as WNT’, might prove informative. 

Perhaps the most urgent question arising 
from this study’ is whether the genetic differ- 
ences between the two disease compartments 
lead to distinct biological features that make 
them respond differently to treatment. In 
mice, these compartments remain genomically 
characterized entities, the biological and ther- 
apeutic importance of which is untested. In 
humans, clinical-trial data show’ that primary 
and metastatic sites respond similarly to cur- 
rent therapies (with cure achieved at both sites) 
in around 60% of children with metastatic dis- 
ease, but a more objective assessment of treat- 
ment response is confounded by the fact that 
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primary tumours are mostly removed by sur- 
gery prior to treatment. Wu et al. provide initial 
evidence that the tumour compartments may 
respond differently to current therapies in cer- 
tain patients, but they rightly caution that these 
effects could also relate to clinical factors such 
as radiotherapy being delivered at different 
intensities to different tumour sites. 

Some of the mutations identified by Wu 
and colleagues’ experiments may also reveal 
biological processes or pathways that could 
offer drug targets for the improved treatment 
of primary tumours, metastases, or both. The 
new mouse strains provide excellent models 
in which to test this possibility. The multi- 
tude and variety of mutations described by 
Wu et al.' are noteworthy, but the next chal- 
lenge is to determine which of them can drive 
tumour development, which are therapeuti- 
cally relevant, and which occur at sufficient 
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frequency in the human disease to warrant 
their pursuit as potential targets. The authors 
justifiably reason that targets that are com- 
mon to primary tumours and metastases, 
in both humans and mice, are those most 
attractive for further development. However, 
only one cellular pathway, insulin-dependent 
signalling, meets these criteria on the basis of 
their current data. 

Providing answers to all these questions 
will require further biological investigation 
across species, as well as clinical studies. An 
additional challenge is posed by the fact that 
there are fewer than 700 cases of medullo- 
blastoma per year in Europe. More routine 
biopsy and characterization of human meta- 
stases will be essential, and the impetus and 
ethical justification for such a fundamental 
change to clinical practice will, at least in part, 
come from experimental studies such as those 
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presented here. Time will tell whether this tale 
of Sleeping Beauty and mice develops into a 
clinically relevant human paradigm. = 
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